Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix aarch64 bitmask immediate encoding and implement some aarch64 instructions #1458

Merged

Conversation

DukMastaaa
Copy link
Contributor

@DukMastaaa DukMastaaa commented Apr 5, 2022

File restructuring

To manage growth of file sizes when implementing more of the ARMv8 ISA, this PR splits the instructions implemented in aarch64.lisp into smaller files based on category:

  • arithmetic
  • atomic (CAS, CSEL and the like)
  • branch
  • data movement (load, store, move)
  • logical
  • processor state
  • special (barriers, etc)

A full list of the instructions belonging to each category has been developed here.

aarch64 Bitmask Immediate decoding

Before this PR, running bap mc --arch=aarch64 --show-bil -- 20 00 7f 92 (assembly instruction and x0, x1, 2) incorrectly lifted the instruction:

{
  R0 := R1 & 0x7C0
}

The constant 0x7C0 appears in place of 2 due to ARMv8's "bitmask immediates" which encode immediate values for the logical instructions AND, ORR and EOR. This PR implements helper functions in Primus Lisp which decode these immediates using the algorithm from the ARMv8 ISA pseudocode, and updates the semantics of those logical instructions. Helper functions specific for aarch64 such as these have been moved to aarch64-helper.lisp.

Barriers

This PR represents the DMB, DSB and ISB barriers as external function calls using the new (special) primitive implemented in #1410. For example, running bap mc --arch=aarch64 --show-bil --show-bir -- bf 3b 03 d5 (instruction dmb ish) gives

{
  call(special:barrier:dmb:ish)
}
0000000c:
0000000b: goto @special:barrier:dmb:ish

Load-acquire and store-release

This PR represents load-acquire and store-release semantics using the (special) primitive like barriers described above. It's used in the implementation of CAS* instructions. For example, running bap mc --arch=aarch64 --show-bil -- 41 fc e0 88 (instruction casal w0, w1, [x2]) gives

{
  #0 := mem[R2, el]:u32
  call(special:load-acquire)
  #1 := #0 = extract:31:0[R0]
  if (#1) {
    call(special:store-release)
    mem := mem with [R2, el]:u32 <- low:32[extract:31:0[R1]]
  }
  R0 := #0
}

All CAS* instructions except the CASP* family are implemented in this PR.

Other instructions

Here are all the instructions that have been implemented in this PR.

  • Arithmetic: SDIV, UDIV, MADD, MSUB
  • Atomic: CSEL, CSINC, CSINV, CSNEG, CAS* except CASP*
  • Branch: TBZ, TBNZ
  • Data movement: STUR, STURH, STURB
  • Logical: AND*ri, ORR*ri, EOR*ri
  • Processor state: CCMN, CCMP, RMIF, SETF8, SETF16
  • Special: DMB, DSB, ISB

Possible issues

Update: the below issues have been fixed with #1461; these instructions now work correctly.

LLVM can't seem to disassemble the RMIF, SETF8 and SETF16 instructions (try bap-mc --show-bil --arch=aarch64 -- "00 04 00 ba" corresponding to rmif x0, 0, 0). These three are part of ARMv8.4; perhaps LLVM does not implement this version? Since --show-knowledge output gives no information, the LLVM opcodes for these instructions in aarch64-pstate.lisp may be incorrect, and the semantics have not been tested yet.

Another ARMv8.4 instruction, CFINV, does get disassembled but gets turned into MSR (register) even though CFINV isn't an alias. --show-knowledge reports that LLVM returned msr 0x200 XZR for CFINV but this isn't even a valid opcode; both arguments of MSR (register) need to be registers. Not sure what's going on here.

DukMastaaa and others added 26 commits February 4, 2022 01:53
separated into category files
LLVM can't seem to disassemble ARMv8.4 instructions like RMIF, SETF8
and SETF16. Also, CFINV gets turned into MSR (register) but LLVM
returns ill-formed asm...?
I've commented this in aarch64-pstate.lisp.
i typed is_zero with underscore instead of primitive is-zero
documentation added for macros and helper functions.
llvm mnemonics most likely incorrect, will investigate
why bap's llvm doesn't disassemble these insns
i've used ` bap mc --cpu=cortex-a55 --triple=aarch64`
to get the llvm mnemonic, but will need to talk to ivan
about lisp context and
specifying generic armv8.x instead of a specific cpu
@ivg
Copy link
Member

ivg commented Apr 11, 2022

The tests are failing with Unresolved function or primitive aarch64:CASordXr'and indeed I can only find a definition ofCASordX`. Maybe a typo?

Besides, you can run tests using make test

@ivg
Copy link
Member

ivg commented Apr 11, 2022

See #1461, it should enable instructions such as RMIF, SET8, CASX, and others that are available only in armv8.1a through v8.6a. Also, if you enable the "allow edits from the maintainers" checkbox on the pull request, then I can push fixes directly to your branch. But if you have reasons to keep it unchecked, then no worries, I can create pull requests.

@DukMastaaa
Copy link
Contributor Author

Here's some things I noticed while fixing up the code:

  1. Can the code for SETF8 be optimised in any way? Currently, the code outputs the following BIL for setf8 w0:
{
  NF := extract:7:7[extract:31:0[R0]]
  ZF := extract:7:0[extract:31:0[R0]] = 0
  VF := extract:8:8[extract:31:0[R0]] ^ extract:7:7[extract:31:0[R0]]
}

Maybe we can move the extract:7:0 into a local variable or something?

  1. Is the output of the bitmask immediate code correct? Given the instruction and w0, w1, 3, BIL output is
{
  R0 := extend:64[extract:31:0[R1]] & 0x300000003
}

Just wanting to confirm that the leading 3 there will still zero out the upper 32 bits of R0, since we've zero-extended the bottom 32 bits then applied AND to the mask. This weird 30000... occurs for all immediate values with W registers but not X.

  1. Every time I invoke bap mc to test lifter output, it tells me that '+v8.6a' is not a recognized feature for this target (ignoring feature). Is this anything to be worried about?

@ivg
Copy link
Member

ivg commented Apr 12, 2022

3. Every time I invoke bap mc to test lifter output, it tells me that '+v8.6a' is not a recognized feature for this target (ignoring feature). Is this anything to be worried about?

It looks like v8.6a is not supported by your version of LLVM. We need to be more careful when adding the features and consider the llvm version, I will work on the fix.

2. Is the output of the bitmask immediate code correct? Given the instruction and w0, w1, 3, BIL output is

Yep, looks like that it doesn't work for some reason for 32-bit version of instructions, see how llvm encodes them,

echo "and w0, w1, 3" | llvm-mc -triple aarch64 --show-encoding --show-inst
	.text
	and	w0, w1, #0x3                    // encoding: [0x20,0x04,0x00,0x12]
                                        // <MCInst #784 ANDWri
                                        //  <MCOperand Reg:186>
                                        //  <MCOperand Reg:187>
                                        //  <MCOperand Imm:1>>
                                        

vs.

echo "and x0, x1, 3" | llvm-mc -triple aarch64 --show-encoding --show-inst
	.text
	and	x0, x1, #0x3                    // encoding: [0x20,0x04,0x40,0x92]
                                        // <MCInst #786 ANDXri
                                        //  <MCOperand Reg:217>
                                        //  <MCOperand Reg:218>
                                        //  <MCOperand Imm:4097>>

So it looks like that ANDWri and ANDXri have different encodings, so we can represent both with log*ri. I am currently trying to figure out what kind of encoding is used on the W versions.

So I looked into the LLVM code, and they pass the bitness to their decoding function

/// decodeLogicalImmediate - Decode a logical immediate value in the form
/// "N:immr:imms" (where the immr and imms fields are each 6 bits) into the
/// integer value it represents with regSize bits.
static inline uint64_t decodeLogicalImmediate(uint64_t val, unsigned regSize) {
  // Extract the N, imms, and immr fields.
  unsigned N = (val >> 12) & 1;
  unsigned immr = (val >> 6) & 0x3f;
  unsigned imms = val & 0x3f;

  assert((regSize == 64 || N == 0) && "undefined logical immediate encoding");
  int len = 31 - countLeadingZeros((N << 6) | (~imms & 0x3f));
  assert(len >= 0 && "undefined logical immediate encoding");
  unsigned size = (1 << len);
  unsigned R = immr & (size - 1);
  unsigned S = imms & (size - 1);
  assert(S != size - 1 && "undefined logical immediate encoding");
  uint64_t pattern = (1ULL << (S + 1)) - 1;
  for (unsigned i = 0; i < R; ++i)
    pattern = ror(pattern, size);

  // Replicate the pattern to fill the regSize.
  while (size != regSize) {
    pattern |= (pattern << size);
    size *= 2;
  }
  return pattern;
}

Maybe it is better to implement it using the LLVM approach?

1. Can the code for SETF8 be optimised in any way? Currently, the code outputs the following BIL for setf8 w0:

Are you talking about nesting extracts that could be optimized away?

ivg added 6 commits April 12, 2022 17:12
The Primus Lisp semantic primitives were hardcoding 64-bit
arithmetic, which was obviously incorrect. In addition, the
shifting operations were coercing the operands to the same size,
like in arithmetic operations, which contradicts the established
semantics of shifts both in Core Theory and in BIL. Now, the shifting
operators will produce values of the same sort as the sort of the
first operand.
@ivg
Copy link
Member

ivg commented Apr 12, 2022

PR UQ-PAC#4 will resolve the conflicts and will make the decoding closer. It looks like we need to properly cast the immediate to 32 bits to get the correct result, though I am not very sure. Passing on the baton to you)

 updates to changes in BAP, improving `clz`
the result in decode-bit-masks should only be
replicated to the width of the registers being assigned to.
so, we need to pass in the register width when decoding.
@DukMastaaa
Copy link
Contributor Author

Thanks for the PR, it's merged in now.
After comparing the ISA pseudocode to the LLVM decoding code you sent, the issue was the result needs to be replicated to fill the width of the register we're operating on. I hardcoded memory-width as 64, but I've modified the functions to take register-width as an argument now like LLVM and it works correctly.

Copy link
Member

@ivg ivg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@ivg ivg merged commit e9bd12f into BinaryAnalysisPlatform:master Apr 21, 2022
@ivg
Copy link
Member

ivg commented Apr 21, 2022

Thanks again, guys, for an awesome contribution!

kit-ty-kate pushed a commit to ocaml/opam-repository that referenced this pull request Jul 14, 2022
2.5.0
=====

- BinaryAnalysisPlatform/bap#1390 adds the flattening pass to the library interface
- BinaryAnalysisPlatform/bap#1389 adds `insn-code` to the `Theory.Semantics` class
- BinaryAnalysisPlatform/bap#1394 adds the `Bitvec.modular` function
- BinaryAnalysisPlatform/bap#1395 adds LLVM 13/14 compatibility
- BinaryAnalysisPlatform/bap#1408 adds support for mips64el target
- BinaryAnalysisPlatform/bap#1409 adds the `--print-missing` option to print unlifed instructions
- BinaryAnalysisPlatform/bap#1410 adds several new Primus Lisp primitives and new instructions
- BinaryAnalysisPlatform/bap#1428 adds the monad choice interface to the knowledge base
- BinaryAnalysisPlatform/bap#1429 refines the `Theory.Target.matches` and adds the `matching` function
- BinaryAnalysisPlatform/bap#1434 adds arm unpredicated BL instruction
- BinaryAnalysisPlatform/bap#1444 adds the x86/amd64 plt corrector component to the Primus base system
- BinaryAnalysisPlatform/bap#1445 updates the `Sub.compute_liveness` function to handle SSA form
- BinaryAnalysisPlatform/bap#1446 provides the new liveness analysis
- BinaryAnalysisPlatform/bap#1452 implements pcode floating-point and special operators
- BinaryAnalysisPlatform/bap#1457 adds optional `join` for `Knowledge.Domain.mapping`
- BinaryAnalysisPlatform/bap#1461 enables v8.{1,2,3,4,5,6}a revisions for the aarch64 target
- BinaryAnalysisPlatform/bap#1464 adds arbitrary-precision loopless clz and popcount to Primus Lisp
- BinaryAnalysisPlatform/bap#1460 adds compatibility with Core_kernel >= 0.15
- BinaryAnalysisPlatform/bap#1466 adds semantics for the x86 SSE floating-point instructions
- BinaryAnalysisPlatform/bap#1469 adds the jump destination addresses/names to the assembly output
- BinaryAnalysisPlatform/bap#1458 adds more aarch64 instructions
- BinaryAnalysisPlatform/bap#1473 adds an `--arm-features` command-line option
- BinaryAnalysisPlatform/bap#1476 implements the naming scheme for interrupts
- BinaryAnalysisPlatform/bap#1479 reifies external subroutines and intrinsics into I
- BinaryAnalysisPlatform/bap#1482 enables BIR subroutines without an explicit return
- BinaryAnalysisPlatform/bap#1481 enables disabling the patterns plugin
- BinaryAnalysisPlatform/bap#1483 implements floating-point intrinsic subroutines
- BinaryAnalysisPlatform/bap#1488 adds compatibility with OCaml 4.14 and Core v0.15
- BinaryAnalysisPlatform/bap#1489 adds some missing functionality to Primus Lisp POSIX
- BinaryAnalysisPlatform/bap#1490 adds some missing C POSIX APIs
- BinaryAnalysisPlatform/bap#1492 makes bit-twiddling operations easier to read and analyze
- BinaryAnalysisPlatform/bap#1493 adds smart constructors and destructors to the C types library
- BinaryAnalysisPlatform/bap#1491 adds semantics for the x86-64 `popq` instruction
- BinaryAnalysisPlatform/bap#1497 extends the C.Abi library
- BinaryAnalysisPlatform/bap#1498 adds the extended lvalue assignment to Primus Interpreter
- BinaryAnalysisPlatform/bap#1499 makes BIL smart constructors smart
- BinaryAnalysisPlatform/bap#1500 makes argument passing well-typed
- BinaryAnalysisPlatform/bap#1503 reimplements C types printing functions
- BinaryAnalysisPlatform/bap#1504 extends the demanglers library to the new targets infrastructure
- BinaryAnalysisPlatform/bap#1505 rewrites x86 abi using the new infrastructure
- BinaryAnalysisPlatform/bap#1511 implements some missing Thumb instructions
- BinaryAnalysisPlatform/bap#1513 implements the x86_64 padd instructions
- BinaryAnalysisPlatform/bap#1515 allows target overriding
- BinaryAnalysisPlatform/bap#1516 adds armv8 BFM instructions
- BinaryAnalysisPlatform/bap#1517 publishes Theory.Target.nicknames and extends Primus Contexts
- BinaryAnalysisPlatform/bap#1519 extends Core Theory with target registration and lookup
- BinaryAnalysisPlatform/bap#1520 adds the high-level calling convention specification language
- BinaryAnalysisPlatform/bap#1521 reimplements x86 targets using the new infrastructure
- BinaryAnalysisPlatform/bap#1522 reimplements ARM ABI and target specification
- BinaryAnalysisPlatform/bap#1523 rewrites mips targets and abi
- BinaryAnalysisPlatform/bap#1524 adds C data type layout
- BinaryAnalysisPlatform/bap#1525 adds the pass by reference argument passing method
- BinaryAnalysisPlatform/bap#1526 restructures powerpc targets and reimplements ppc32 eabi
- BinaryAnalysisPlatform/bap#1529 makes the ABI processors usable programmatically

- BinaryAnalysisPlatform/bap#1391 fixes ARM/Thumb `movt` semantics
- BinaryAnalysisPlatform/bap#1396 fixes the path plugin loader path handling
- BinaryAnalysisPlatform/bap#1414 fixes the pc value in pc-relative thumb ldr
- BinaryAnalysisPlatform/bap#1420 fixes the low-level Disasm_expert.Basic.create function
- BinaryAnalysisPlatform/bap#1421 fixes the core-theory plugin semantics tags
- BinaryAnalysisPlatform/bap#1426 fixes arm predication
- BinaryAnalysisPlatform/bap#1438 reads correctly unqualified system names
- BinaryAnalysisPlatform/bap#1439 fixes a bug in the KB update function, adds new functions
- BinaryAnalysisPlatform/bap#1448 fixes an accidental dependency on the bap-traces internal module
- BinaryAnalysisPlatform/bap#1449 fixes unconditional pop with return in thumb
- BinaryAnalysisPlatform/bap#1455 fixes register assignments in p-code semantics
- BinaryAnalysisPlatform/bap#1462 fixes the `cast-signed` Primus Lisp primitive
- BinaryAnalysisPlatform/bap#1463 fixes the arithmetic modulus in Primus Lisp primitives
- BinaryAnalysisPlatform/bap#1465 fixes handling of `jmp term`s in the flatten pass
- BinaryAnalysisPlatform/bap#1467 fixes a sporadic internal error in the cache garbage collector
- BinaryAnalysisPlatform/bap#1468 fixes the relocation symbolizer incorrect handling of intrinsics
- BinaryAnalysisPlatform/bap#1458 fixes aarch64 bitmask immediate encoding
- BinaryAnalysisPlatform/bap#1486 fixes type unification on binary operation application
- BinaryAnalysisPlatform/bap#1485 fixes little-endian MIPS disassembling
- BinaryAnalysisPlatform/bap#1494 fixes the encoding of the comparison operators
- BinaryAnalysisPlatform/bap#1496 fixes registers allocation in the abi specification DSL
- BinaryAnalysisPlatform/bap#1502 fixes the bitvector order function
- BinaryAnalysisPlatform/bap#1528 fixes armv4t name that was missing the arm prefix

- BinaryAnalysisPlatform/bap#1393 improves the Primus Lisp documentation generator
- BinaryAnalysisPlatform/bap#1397 fixes the macOS CI build
- BinaryAnalysisPlatform/bap#1399 updates the url of the testing repo to use the encrypted version
- BinaryAnalysisPlatform/bap#1432 updates the docker image
- BinaryAnalysisPlatform/bap#1435 selects specific llvm components for linking
- BinaryAnalysisPlatform/bap#1447 updates to the git+https in the dockerfiles
- BinaryAnalysisPlatform/bap#1470 corrects linking of Unix library in configure
- BinaryAnalysisPlatform/bap#1478 fixes the opam/opam dev-repo protocol which broke the release action
- BinaryAnalysisPlatform/bap#1480 adds an automation to build a docker image for the latest release
- BinaryAnalysisPlatform/bap#1514 adds the mmap dependency

- BinaryAnalysisPlatform/bap#1386 adds missing ARM target ABI information
- BinaryAnalysisPlatform/bap#1388 adds aliasing information for x86
- BinaryAnalysisPlatform/bap#1392 adds an option to directly use ogre files as a loader
- BinaryAnalysisPlatform/bap#1398 provides the assembly string as a promise (removes #undefined)
- BinaryAnalysisPlatform/bap#1400 improves the computation of the instruction properties
- BinaryAnalysisPlatform/bap#1401 improves the KB.Value merge operation
- BinaryAnalysisPlatform/bap#1402 moves promises and theories into the core-theory plugin
- BinaryAnalysisPlatform/bap#1403 moves knowledge base rules from the library to the plugin
- BinaryAnalysisPlatform/bap#1404 improves the peformance of the byte patterns matcher (1/3)
- BinaryAnalysisPlatform/bap#1405 improves the performance of bitvectors (2/3)
- BinaryAnalysisPlatform/bap#1411 [optimization] do not store empty objects in the knowledge base
- BinaryAnalysisPlatform/bap#1412 updates the KB version number and adds a few more microoptimizations
- BinaryAnalysisPlatform/bap#1413 updates bap to latest OCaml, switches to newer bitstrings
- BinaryAnalysisPlatform/bap#1415 switches to patricia trees in the KB implementation
- BinaryAnalysisPlatform/bap#1416 Reimplements x86 bitscan and popcnt
- BinaryAnalysisPlatform/bap#1418 uses the builtin clz function from base, instead of the custom one
- BinaryAnalysisPlatform/bap#1417 relaxes the speculative disassembler constraints
- BinaryAnalysisPlatform/bap#1419 allows bapbuild to work when bap and other defaults are not present
- BinaryAnalysisPlatform/bap#1422 relaxes interpreters to allow ill-typed operations
- BinaryAnalysisPlatform/bap#1425 applies ARM modified immediate (MIC) decoding in more places
- BinaryAnalysisPlatform/bap#1423 reimplements clz using the branchless/loopless algorithm
- BinaryAnalysisPlatform/bap#1427 removes unnecessary units from the knowledge base
- BinaryAnalysisPlatform/bap#1430 refines and extends target definitions
- BinaryAnalysisPlatform/bap#1431 partially upgrades byteweight to work with the modern bap
- BinaryAnalysisPlatform/bap#1441 uses Allen's Interval Algebra in the KB.Value merge implementation
- BinaryAnalysisPlatform/bap#1442 wraps proposals into with_empty and adds more guards
- BinaryAnalysisPlatform/bap#1443 adds subinstruction contraction to improve the ghidra lifter output
- BinaryAnalysisPlatform/bap#1433 adds mode events to traces
- BinaryAnalysisPlatform/bap#1450 hushes bil lifters
- BinaryAnalysisPlatform/bap#1451 removes falls-through from unconditional branches in IR reification
- BinaryAnalysisPlatform/bap#1454 improves the setw function used
- BinaryAnalysisPlatform/bap#1456 removes Thumb2 branches from the legacy ARM lifter
- BinaryAnalysisPlatform/bap#1471 uses function starts as the entires when building the symtab
- BinaryAnalysisPlatform/bap#1472 improves disassembler performance
- BinaryAnalysisPlatform/bap#1475 unifies name generation for IR subroutines
- BinaryAnalysisPlatform/bap#1477 removes the special Primus Lisp primitive
- BinaryAnalysisPlatform/bap#1484 disables byteweight
- BinaryAnalysisPlatform/bap#1487 reduces memory footprint
- BinaryAnalysisPlatform/bap#1501 makes all C data type sizes a multitude of their alignment
- BinaryAnalysisPlatform/bap#1506 optimizes encoding computation for x86
- BinaryAnalysisPlatform/bap#1510 adds an example on how to create a monad transformer stack (#1354)
- BinaryAnalysisPlatform/bap#1518 uses signed casts for promoting arguments
- BinaryAnalysisPlatform/bap#1530 turns x86 endbr instructions into nops
- BinaryAnalysisPlatform/bap#1531 adds patterns to recognize certain x86 endbr as function starts
- BinaryAnalysisPlatform/bap#1532 improves the main subroutine discovery within glibc runtime
- BinaryAnalysisPlatform/bap#1535 prevents knowledge conflicts on mangled names
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants