-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic RISCV support #1198
Basic RISCV support #1198
Conversation
… the TableGen files generated from llvm-tblgen. Add Disassembler.h
…ler_getInstruction, and RISCV_getInstruction
…o RISCVGenDisassemblerTables.inc. Add and modified RISCVGenSubtargetInfo.inc. Start creation of RISCVInstPrinter.h
…nor fixes to RISCVDisassembler.c. Working on RISCVInstPrinter
…Info.inc, RISCVModule.c. Working on riscv.h
…DDI, AND works properly.
…and test_iter to work w/ the current code strcuture
…ents in struct initializer). Added RISCV tests to test_iter.c
awesome, thanks for doing this! but this still fails on CI now? |
yeah, some compile complains. I'll try to fix it later. |
It seems working now. Could you plz review? |
Sure, i will do that.
|
can you please use tabs for indentation in all C code? |
arch/RISCV/RISCVDisassembler.c
Outdated
|
||
static DecodeStatus DecodeFPR32RegisterClass(MCInst *Inst, uint64_t RegNo, | ||
uint64_t Address, | ||
const void *Decoder) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please put the open bracket {
of a function on a new line
arch/RISCV/RISCVModule.c
Outdated
@@ -0,0 +1,49 @@ | |||
/* Capstone Disassembly Engine */ | |||
/* By Nguyen Anh Quynh <aquynh@gmail.com>, 2013-2014 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these code are not mine, please put your name here (and elsewhere) ;-)
Should be fixed now. Plz review it again. |
arch/RISCV/RISCVBaseInfo.h
Outdated
|
||
// RISCVII - This namespace holds all of the target specific flags that | ||
// instruction info tracks. All definitions must match RISCVInstrFormats.td. | ||
enum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are following Linux kernel coding style, so please put {
after enum
, not on the next line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will do.
arch/RISCV/RISCVBaseInfo.h
Outdated
RISCVFPRndMode_Invalid | ||
}; | ||
|
||
inline static StringRef roundingModeToString(RoundingMode RndMode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please put open bracket of a function on a new line (next line), not on the same line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this file is not indented with tabs yet?
please double check indentation of all other files, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see that this code is commented out, so perhaps auto-indent did not work.
but i can still see code in some other files not in proper indentation format yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, that one was commented so the indent tool didn't work on it. Will fix it manually.
arch/RISCV/RISCVDisassembler.c
Outdated
#define GET_SUBTARGETINFO_ENUM | ||
#include "RISCVGenSubtargetInfo.inc" | ||
|
||
static uint64_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our coding style put function type & function name on the same line, but does not break them into 2 lines like this.
please fix this, and also other places.
arch/RISCV/RISCVDisassembler.c
Outdated
// instruction set extensions have the option of defining instructions up to | ||
// 176 bits wide. | ||
*Size = 4; | ||
if (code_len < 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this {
should be on the same line with the condition check, not on the next line.
Seems all fixed. Plz review it again. Thanks. |
include/capstone/riscv.h
Outdated
#define CAPSTONE_RISCV_H | ||
|
||
/* Capstone Disassembly Engine */ | ||
/* By Nguyen Anh Quynh <aquynh@gmail.com>, 2013-2014 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please put your name here, not mine.
RISCV_INS_FCLASS_S, | ||
RISCV_INS_FCVT_D_L, | ||
RISCV_INS_FCVT_D_LU, | ||
RISCV_INS_FCVT_D_S, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my impression is that we can map related instructions into one, like RISCV_INS_FCVT_L_D & RISCV_INS_FCVT_L_S into just RISCV_INS_FCVT_L, or even all related ones into RISCV_INS_FCVT.
correct me if i am wrong (i dont know much about RISCV), but this is how we did with other archs, like we map ADDxxx into just ADD on X86.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to opcode? I have no idea if it could be mapped like x86.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@porto703, what do you think? can we map those opcode to fewer instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the mapping that you suggest is related to the opcode, then yes, probably instructions with the same opcode can be mapped together. But I am not sure how the mapping is done in x86.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@porto703, the mapping is all in xxxMappingInsn.inc file. for X86, you can grep X86_ADD from X86MappingInsn.inc file, and get these lines:
X86MappingInsn.inc: X86_ADD16i16, X86_INS_ADD,
X86MappingInsn.inc: X86_ADD16mi, X86_INS_ADD,
X86MappingInsn.inc: X86_ADD16mi8, X86_INS_ADD,
X86MappingInsn.inc: X86_ADD16mr, X86_INS_ADD,
X86MappingInsn.inc: X86_ADD16ri, X86_INS_ADD,
X86MappingInsn.inc: X86_ADD16ri8, X86_INS_ADD,
X86MappingInsn.inc: X86_ADD16rm, X86_INS_ADD,
X86MappingInsn.inc: X86_ADD16rr, X86_INS_ADD,
...
this is how we map all X86_ADDxxx to X86_INS_ADD, which is the opcode of all ADD instructions, regardless of operand types.
what do you think, should we do the same thing for RISCV?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it seems they are mapped based on the opcode. In RISCV several instructions can be mapped into the same opcode group, but still there are other fields that are used to select the type of operation within the same opcode group. I didn't have the chance to look further into capstone to understand how the mapping was being used, and if this may fit for RISCV. So that is why I didn't include a more refined mapping into the first version that I worked on.
But at first glance, it looks to me that a similar mapping may be done based on the opcode.
Still I would like to see @citypw opinion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, I checked the both x86 and RV manuals a bit. It seems very similar which only mapped to the specific subgroup of opcode. I only tested the "add*" ins:
The results of regression test case are the same. I don't know. Maybe we should work toward to cut the mapped ins into fewer ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@porto703, the mapping is all in xxxMappingInsn.inc file. for X86, you can grep X86_ADD from X86MappingInsn.inc file, and get these lines:
X86MappingInsn.inc: X86_ADD16i16, X86_INS_ADD, X86MappingInsn.inc: X86_ADD16mi, X86_INS_ADD, X86MappingInsn.inc: X86_ADD16mi8, X86_INS_ADD, X86MappingInsn.inc: X86_ADD16mr, X86_INS_ADD, X86MappingInsn.inc: X86_ADD16ri, X86_INS_ADD, X86MappingInsn.inc: X86_ADD16ri8, X86_INS_ADD, X86MappingInsn.inc: X86_ADD16rm, X86_INS_ADD, X86MappingInsn.inc: X86_ADD16rr, X86_INS_ADD, ...
this is how we map all X86_ADDxxx to X86_INS_ADD, which is the opcode of all ADD instructions, regardless of operand types.
what do you think, should we do the same thing for RISCV?
In riscv this is not possible, it is the only reason tell the CPU which is 32bit or 64bit.
for example, lw/ld can't merge to load. it only tells the difference according to the w/d.
not like X86 we call distinguish 32bit/64bit follow the register name, In riscv it only has
Xn.
What matters is that the same code can be interpreted differently based on
modes? If not, then the modes you mentioned make no difference.
Example is X86 has 3 modes: 16, 32 & 64, each has different encodings. Or
Arm has Arm and Thumb modes.
|
AFAIK, there's no separate hardware mode. This point I'll need to confirm. |
RISC-V also has instructions that take up two instead of four bytes (RVC), but unlike Thumb, they don't require a mode switch. They can be executed alongside four-byte instructions. (The only requirement is that the processor supports RVC.) |
Then how can we tell the next instruction is 2 bytes, or 4 bytes?
|
It's encoded in the lower two bits (instructions are encoded in little-endian) |
If so we only have 1 mode (i.e one encoding scheme), thus we dont need to
support cs_option()
|
@@ -257,6 +258,7 @@ typedef struct cs_opt_skipdata { | |||
// X86: 1 bytes. | |||
// XCore: 2 bytes. | |||
// EVM: 1 bytes. | |||
// RISCV: 4 bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 bytes might be more appropriate, because of RVC
RISC-V has 32-bit and 64-bit (and 128-bit) variants/modes though, which determine which instructions are valid. In some cases, such as C.JAL and C.ADDIW, the same bytes can disassemble to completely unrelated instructions, depending on the variant. (NOTE: I don't think the code in this pull request supports RVC, and thus the |
@neuschaefer , this sounds like RISV has option to be initialized in 32bit & 64 bit mode then. |
I'm very interested in this pull request, as I'm planning on adding RISC-V support to a project that's already using capstone for x86_64 and aarch64 disassembly. Is there anything that I can do to help here, besides perhaps putting it through some more testing? |
this looks pretty good to me, but there are some open questions regarding mapping related instructions into smaller set, as discussed in this thread. let me know if you can contribute towards that. |
This one looks pretty good for inclusion into 4.0 version too. |
lots of conflicts, probably because of the changes in branch names, can you please retarget the PR for the 4.1 branch? |
Ping? |
peng? |
@fanfuqiang any update? |
The RISC-V port of LLVM is changing fast, especially in recent months. RISC-V 32 is stable for now. |
In general this PR looks quite nice, just some concerns raised without feedback yet. Please target the next branch for future PR. We hope to have this ready for v5. |
Isnt this merged already? |
Yes
|
Hi,
I rebased the previous PR[1] to the latest code. Plz review it.
[1] #1131