Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic RISCV support #1198

Closed
wants to merge 24 commits into from
Closed

Basic RISCV support #1198

wants to merge 24 commits into from

Conversation

citypw
Copy link

@citypw citypw commented Jul 5, 2018

Hi,

I rebased the previous PR[1] to the latest code. Plz review it.

[1] #1131

@aquynh
Copy link
Collaborator

aquynh commented Jul 5, 2018

awesome, thanks for doing this!

but this still fails on CI now?

@citypw
Copy link
Author

citypw commented Jul 5, 2018

yeah, some compile complains. I'll try to fix it later.

@citypw
Copy link
Author

citypw commented Jul 5, 2018

It seems working now. Could you plz review?

@aquynh
Copy link
Collaborator

aquynh commented Jul 5, 2018 via email

@aquynh
Copy link
Collaborator

aquynh commented Jul 5, 2018

can you please use tabs for indentation in all C code?


static DecodeStatus DecodeFPR32RegisterClass(MCInst *Inst, uint64_t RegNo,
uint64_t Address,
const void *Decoder) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please put the open bracket { of a function on a new line

@@ -0,0 +1,49 @@
/* Capstone Disassembly Engine */
/* By Nguyen Anh Quynh <aquynh@gmail.com>, 2013-2014 */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these code are not mine, please put your name here (and elsewhere) ;-)

@citypw
Copy link
Author

citypw commented Jul 5, 2018

Should be fixed now. Plz review it again.


// RISCVII - This namespace holds all of the target specific flags that
// instruction info tracks. All definitions must match RISCVInstrFormats.td.
enum
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are following Linux kernel coding style, so please put { after enum, not on the next line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do.

RISCVFPRndMode_Invalid
};

inline static StringRef roundingModeToString(RoundingMode RndMode) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please put open bracket of a function on a new line (next line), not on the same line

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this file is not indented with tabs yet?

please double check indentation of all other files, too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see that this code is commented out, so perhaps auto-indent did not work.
but i can still see code in some other files not in proper indentation format yet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, that one was commented so the indent tool didn't work on it. Will fix it manually.

#define GET_SUBTARGETINFO_ENUM
#include "RISCVGenSubtargetInfo.inc"

static uint64_t
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our coding style put function type & function name on the same line, but does not break them into 2 lines like this.
please fix this, and also other places.

// instruction set extensions have the option of defining instructions up to
// 176 bits wide.
*Size = 4;
if (code_len < 4)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this { should be on the same line with the condition check, not on the next line.

@citypw
Copy link
Author

citypw commented Jul 5, 2018

Seems all fixed. Plz review it again. Thanks.

#define CAPSTONE_RISCV_H

/* Capstone Disassembly Engine */
/* By Nguyen Anh Quynh <aquynh@gmail.com>, 2013-2014 */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please put your name here, not mine.

RISCV_INS_FCLASS_S,
RISCV_INS_FCVT_D_L,
RISCV_INS_FCVT_D_LU,
RISCV_INS_FCVT_D_S,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my impression is that we can map related instructions into one, like RISCV_INS_FCVT_L_D & RISCV_INS_FCVT_L_S into just RISCV_INS_FCVT_L, or even all related ones into RISCV_INS_FCVT.

correct me if i am wrong (i dont know much about RISCV), but this is how we did with other archs, like we map ADDxxx into just ADD on X86.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to opcode? I have no idea if it could be mapped like x86.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@porto703, what do you think? can we map those opcode to fewer instructions?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the mapping that you suggest is related to the opcode, then yes, probably instructions with the same opcode can be mapped together. But I am not sure how the mapping is done in x86.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@porto703, the mapping is all in xxxMappingInsn.inc file. for X86, you can grep X86_ADD from X86MappingInsn.inc file, and get these lines:

X86MappingInsn.inc:     X86_ADD16i16, X86_INS_ADD,
X86MappingInsn.inc:     X86_ADD16mi, X86_INS_ADD,
X86MappingInsn.inc:     X86_ADD16mi8, X86_INS_ADD,
X86MappingInsn.inc:     X86_ADD16mr, X86_INS_ADD,
X86MappingInsn.inc:     X86_ADD16ri, X86_INS_ADD,
X86MappingInsn.inc:     X86_ADD16ri8, X86_INS_ADD,
X86MappingInsn.inc:     X86_ADD16rm, X86_INS_ADD,
X86MappingInsn.inc:     X86_ADD16rr, X86_INS_ADD,
...

this is how we map all X86_ADDxxx to X86_INS_ADD, which is the opcode of all ADD instructions, regardless of operand types.

what do you think, should we do the same thing for RISCV?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems they are mapped based on the opcode. In RISCV several instructions can be mapped into the same opcode group, but still there are other fields that are used to select the type of operation within the same opcode group. I didn't have the chance to look further into capstone to understand how the mapping was being used, and if this may fit for RISCV. So that is why I didn't include a more refined mapping into the first version that I worked on.
But at first glance, it looks to me that a similar mapping may be done based on the opcode.
Still I would like to see @citypw opinion on this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I checked the both x86 and RV manuals a bit. It seems very similar which only mapped to the specific subgroup of opcode. I only tested the "add*" ins:

citypw@ac4cc0f
citypw@d8d8adb

The results of regression test case are the same. I don't know. Maybe we should work toward to cut the mapped ins into fewer ones?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@porto703, the mapping is all in xxxMappingInsn.inc file. for X86, you can grep X86_ADD from X86MappingInsn.inc file, and get these lines:

X86MappingInsn.inc:     X86_ADD16i16, X86_INS_ADD,
X86MappingInsn.inc:     X86_ADD16mi, X86_INS_ADD,
X86MappingInsn.inc:     X86_ADD16mi8, X86_INS_ADD,
X86MappingInsn.inc:     X86_ADD16mr, X86_INS_ADD,
X86MappingInsn.inc:     X86_ADD16ri, X86_INS_ADD,
X86MappingInsn.inc:     X86_ADD16ri8, X86_INS_ADD,
X86MappingInsn.inc:     X86_ADD16rm, X86_INS_ADD,
X86MappingInsn.inc:     X86_ADD16rr, X86_INS_ADD,
...

this is how we map all X86_ADDxxx to X86_INS_ADD, which is the opcode of all ADD instructions, regardless of operand types.

what do you think, should we do the same thing for RISCV?

In riscv this is not possible, it is the only reason tell the CPU which is 32bit or 64bit.
for example, lw/ld can't merge to load. it only tells the difference according to the w/d.
not like X86 we call distinguish 32bit/64bit follow the register name, In riscv it only has
Xn.

@aquynh
Copy link
Collaborator

aquynh commented Jul 6, 2018 via email

@citypw
Copy link
Author

citypw commented Jul 6, 2018

AFAIK, there's no separate hardware mode. This point I'll need to confirm.

@neuschaefer
Copy link

RISC-V also has instructions that take up two instead of four bytes (RVC), but unlike Thumb, they don't require a mode switch. They can be executed alongside four-byte instructions. (The only requirement is that the processor supports RVC.)

@aquynh
Copy link
Collaborator

aquynh commented Jul 6, 2018 via email

@neuschaefer
Copy link

It's encoded in the lower two bits (instructions are encoded in little-endian)

@aquynh
Copy link
Collaborator

aquynh commented Jul 6, 2018 via email

@@ -257,6 +258,7 @@ typedef struct cs_opt_skipdata {
// X86: 1 bytes.
// XCore: 2 bytes.
// EVM: 1 bytes.
// RISCV: 4 bytes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 bytes might be more appropriate, because of RVC

@neuschaefer
Copy link

RISC-V has 32-bit and 64-bit (and 128-bit) variants/modes though, which determine which instructions are valid. In some cases, such as C.JAL and C.ADDIW, the same bytes can disassemble to completely unrelated instructions, depending on the variant.

(NOTE: I don't think the code in this pull request supports RVC, and thus the C.* instructions above, yet)

@aquynh
Copy link
Collaborator

aquynh commented Jul 14, 2018

@neuschaefer , this sounds like RISV has option to be initialized in 32bit & 64 bit mode then.

@XVilka
Copy link
Contributor

XVilka commented Sep 13, 2018

Ping? @aquynh @citypw

@etherealvisage
Copy link

I'm very interested in this pull request, as I'm planning on adding RISC-V support to a project that's already using capstone for x86_64 and aarch64 disassembly. Is there anything that I can do to help here, besides perhaps putting it through some more testing?

@aquynh
Copy link
Collaborator

aquynh commented Oct 27, 2018

this looks pretty good to me, but there are some open questions regarding mapping related instructions into smaller set, as discussed in this thread. let me know if you can contribute towards that.

@XVilka
Copy link
Contributor

XVilka commented Dec 14, 2018

This one looks pretty good for inclusion into 4.0 version too.

@radare
Copy link
Contributor

radare commented Jan 22, 2019

lots of conflicts, probably because of the changes in branch names, can you please retarget the PR for the 4.1 branch?

@XVilka
Copy link
Contributor

XVilka commented Feb 5, 2019

Ping?

@radare
Copy link
Contributor

radare commented Feb 15, 2019

peng?

@citypw
Copy link
Author

citypw commented Feb 15, 2019

@fanfuqiang any update?

@fanfuqiang
Copy link
Contributor

fanfuqiang commented Feb 15, 2019

lots of conflicts, probably because of the changes in branch names, can you please retarget the PR for the 4.1 branch?

The RISC-V port of LLVM is changing fast, especially in recent months. RISC-V 32 is stable for now.
I will make a PR base on the @citypw and LLVM upstream RISC-V, along with the LLVM TableGen patches, in the recent days.

@aquynh
Copy link
Collaborator

aquynh commented Feb 15, 2019

In general this PR looks quite nice, just some concerns raised without feedback yet.

Please target the next branch for future PR. We hope to have this ready for v5.

@radare
Copy link
Contributor

radare commented Jun 23, 2019

Isnt this merged already?

@aquynh
Copy link
Collaborator

aquynh commented Jun 23, 2019 via email

@aquynh aquynh closed this Jun 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants