-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM opcode & operand encoding information (based on auto-sync) #2045
Conversation
- try fix `test_corpus.py`
- `DecodeABSInstruction` - `DecodeRLCInstruction` - `DecodeSSRInstruction` - `DecodeSRCInstruction`
- Rename `ISLR_post_increment` to `ISLR_pos` for clarity - Fix register decoding for TriCore architecture in `TriCoreDisassembler.c` - Add new file `LoadStore.s.cs` to `suite/MC/TriCore`
Incorrect SP encoding
For better understanding purposes
Please fix fuzzing bugs as well:
|
Pretty sure this has to do with the commit onto which I rebased. Shouldn't be an issue on future rebase |
I'll do a second round of review soon. It would be nice, because when printing the details we could also print the variable names of each operand. If every encoding piece has an |
Each OpNum doesn't necessarily correspond to a single piece. There is only one problem and that's why I added my second commit. |
include/capstone/capstone.h
Outdated
/// The bit positions of each piece that form the full operand in order. If | ||
/// there is only one piece then there is only one index as well. Likewise | ||
/// if there are 4 pieces, there are 4 indexes and so on. | ||
/// Also note that these indices DON'T necessarily match the indices in the ISA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should always save bits at the same index as the ISA does. Otherwise there is no reference point for people.
@AngelDev06 Just rebased #2026. You shouldn't get conflicts anymore. |
Ok, so there are some more changes coming (nothing ARM related). You might as well also just wait with the rebase. I can give you the go then some when later this month, if this is fine for you. |
Is it fine for you, if I cherry-pick your commits into the auto-sync PR? It is easier to work on it this way. Also it would probably be merged faster. |
I might be on holidays for about two weeks starting from 15th of this month so I might not be able to rebase. But in the worst case senario, I will be available either at the end of this month or at the start of the next one. Yes you can cherry pick, although I can still rebase onto your changes of the arm64 pr if you want. Also note that I will try to fix the indexes to be according to the ARM documentation & also open a brand new pr on llvm-capstone that includes my changes. Up to you if you want me to rebase now or wait for more changes to come as you mentioned. |
My problem is that So I would cherry pick them then. Especially if you are on vacation (wish you a nice one btw).
If you manage to do this before your vacation it would be great. Or does it change a lot?
Sure. If this happens after #2026 it should be fine as well. |
I can do some small patches to my current clone of
Sure. Is my work going to be included as part of #2026 or merged directly from here?
|
Regarding the encoding index changes
Sorry for letting this PR stall for so long. I looked into it quite a while, and we discussed this internally as well. The problem is not, that we don't need it, but that the current implementation doesn't work for other architectures. As you know, you had to do quite some manipulations of the encodings in While these points would be acceptable for a single architecture module, we cannot do it for all the others. Due to the faulty encoding information in LLVM (and ARM is actually not the worst here), this is not maintainable without extra effort. However, it would be great if you would keep this feature up to date in your repository, so we can refer to it if other people need it for ARM. Also, we plan to add a generator for the SAIL definitions of several architectures. They should have way less flawed encoding information. Thank you a lot for the effort you put into it! And I am sorry I figured out too late, that it is not transferable to the other architecture modules. |
It's understandable. Sadly I can't easily keep up my fork to date since I barely have any time for my own project, so I will probably just leave it as is for the time being. I truly hope this project gets more maintainers in the future since it has a lot of potential and I personally, couldn't find a better disassembler for my needs. |
I implemented two new structures.
The first one being:
and the second one being:
where both are defined in capstone.h and are available for use by all auto-sync archs. An instance of the first struct is provided as part of the detail member as shown here:
while the instance of the second struct is provided directly as part of the arch's operand (detail only) such as
cs_arm_op
. Currently only the ARM arch fills these structs with info as this is the only arch I have generated the required tables for. However this is entirely based on auto-sync so once I publish my modified copy of llvm-capstone (containing the function used to generate the encoding strings), it should be easy to generate the tables for the rest of the auto-sync archs.ARMGenCSMappingInsn.inc
file (and is also wrapped under CAPSTONE_DIET) in the functionARM_set_instr_map_data
(which is invoked right after the instruction is decoded).ARMGenCSMappingInsnOp.inc
file (like the rest of the operand info) and only afterARM_add_cs_detail
is called. The mapping operation takes place on functions such asARM_set_detail_op_reg
andARM_set_detail_op_imm
and in some special cases it's hardcoded inside the rest of theadd_cs_detail
functions that are implemented (such asadd_cs_detail_general
). Note that for the reglist operand, since capstone breaks it down to a list of register operands (each one being a seperatecs_arm_op
), I proceeded to do the same for the encoding. I added a useful comment in that part.A test I did myself on a few instructions to verify that it works includes the following output:
and yes I verified the results from the official ARM documentation. More info:
Let me know if there are any issues.