-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Register naming in Capstone 5 has changed for ARM. #2078
Comments
In looking at this further, this also applies to |
Sure, can you make a pull req?
|
Also these alias will return in #1949. There it is currently directly patched in the string. Although only enabled via an option. Because it wouldn't mimic the |
I'm unsure how to actually generate the tables. According to
(and so on for each This appears to be due to https://reviews.llvm.org/D86836 which happened back in 2020, and which requires the
(and repeat for other files) So presumably there's a specific version of When I get to version 9, I get errors like:
which makes me think that something I've gone too far. So whilst I can see how to change the source files, I cannot see how to make the tables using the table generator code. The following should be all that's needed: charles@phonewave ~/external/capstone/suite/synctools/tablegen (next)> git diff
diff --git a/suite/synctools/tablegen/ARM/ARMRegisterInfo-digit.td b/suite/synctools/tablegen/ARM/ARMRegisterInfo-digit.td
index 3076bfc8..368f0631 100644
--- a/suite/synctools/tablegen/ARM/ARMRegisterInfo-digit.td
+++ b/suite/synctools/tablegen/ARM/ARMRegisterInfo-digit.td
@@ -84,8 +84,10 @@ def R9 : ARMReg< 9, "r9">, DwarfRegNum<[9]>;
def R10 : ARMReg<10, "r10">, DwarfRegNum<[10]>;
def R11 : ARMReg<11, "r11">, DwarfRegNum<[11]>;
def R12 : ARMReg<12, "r12">, DwarfRegNum<[12]>;
-def SP : ARMReg<13, "r13">, DwarfRegNum<[13]>;
-def LR : ARMReg<14, "r14">, DwarfRegNum<[14]>;
+// R13 and R14 were given names in Capstone 4, even when the CS_OPT_SYNTAX_NOREGNAME option was specified,
+// so here they are given these names as well. See https://github.com/capstone-engine/capstone/issues/2078
+def SP : ARMReg<13, "sp">, DwarfRegNum<[13]>;
+def LR : ARMReg<14, "lr">, DwarfRegNum<[14]>;
def PC : ARMReg<15, "pc">, DwarfRegNum<[15]>;
} ... but I cannot work out how to make that get generated into all the tables. However, I can patch it in the table with a couple of small changes: charles@phonewave ~/external/capstone/bindings/python (next)> git diff
diff --git a/arch/ARM/ARMGenRegisterName_digit.inc b/arch/ARM/ARMGenRegisterName_digit.inc
index c4501030..ebd6bce6 100644
--- a/arch/ARM/ARMGenRegisterName_digit.inc
+++ b/arch/ARM/ARMGenRegisterName_digit.inc
@@ -74,7 +74,7 @@ static const char *getRegisterName_digit(unsigned RegNo)
/* 481 */ 'Q', '1', '0', '_', 'Q', '1', '1', '_', 'Q', '1', '2', '_', 'Q', '1', '3', 0,
/* 497 */ 'd', '1', '3', 0,
/* 501 */ 'q', '1', '3', 0,
- /* 505 */ 'r', '1', '3', 0,
+ /* 505 */ 's', 'p', 0, 0,
/* 509 */ 's', '1', '3', 0,
/* 513 */ 'D', '1', '7', '_', 'D', '1', '9', '_', 'D', '2', '1', '_', 'D', '2', '3', 0,
/* 529 */ 'D', '2', '1', '_', 'D', '2', '2', '_', 'D', '2', '3', 0,
@@ -93,7 +93,7 @@ static const char *getRegisterName_digit(unsigned RegNo)
/* 625 */ 'Q', '1', '1', '_', 'Q', '1', '2', '_', 'Q', '1', '3', '_', 'Q', '1', '4', 0,
/* 641 */ 'd', '1', '4', 0,
/* 645 */ 'q', '1', '4', 0,
- /* 649 */ 'r', '1', '4', 0,
+ /* 649 */ 'l', 'r', 0, 0,
/* 653 */ 's', '1', '4', 0,
/* 657 */ 'D', '1', '8', '_', 'D', '2', '0', '_', 'D', '2', '2', '_', 'D', '2', '4', 0,
/* 673 */ 'D', '2', '1', '_', 'D', '2', '2', '_', 'D', '2', '3', '_', 'D', '2', '4', 0, Obviously fixing it properly in the tables would be better, but this appears to do the job for my test program |
@gerph as there are plans to make a patch release in 5.x series, please send a PR fixing these directly. |
In v5 the register naming for ARM registers changed. This was not a documented behaviour, and may not be desireable. This change makes the register names in v5 the same as they were in v4 - that is that when NOREGNAME is used, R13 and R14 are known as `sp` and `lr`. They had been changed to `r13` and `r14`. See capstone-engine#2078 for more details.
In v5 the register naming for ARM registers changed. This was not a documented behaviour, and may not be desireable. This change makes the register names in v5 the same as they were in v4 - that is that when NOREGNAME is used, R13 and R14 are known as `sp` and `lr`. They had been changed to `r13` and `r14`. See capstone-engine#2078 for more details.
PR created: #2108 |
This isn't so much a bug report as a 'there's a change in behaviour... did you know?' report.
The difference
I have an operating system which uses Capstone as its disassembly system (for reporting faults, etc). The output of the disassembly is used as expectations for the tests. This means that its test output (and, obviously, Capstone's output) must remain the same between runs to ensure that the expectations are met. They started failing once Capstone 5 was released, because the representation of registers has changed for ARM.
Specifically, I'm seeing that register 13 in ARM which was reported as
sp
is now being represented asr13
(when CS_OPT_SYNTAX_NOREGNAME is in force)This isn't a problem for me per-se... although I would prefer to see
sp
as the name of the register, but we can acceptr13
although it's not as nice. There isn't a way to rename registers from within the application, so I do not appear to be able to revert the behaviour to what it was before - I can do a search and replace, however that's a little more expensive.To be clear about the problem, here is the behaviour of disassembling the instruction
LDR r1, [sp, #4]
with both capstone 4 and 5:Capstone 4
Capstone 5
Test program to generate the above output
This is my general disassembly tool for investigating the contents of the capstone output; it's a little wordy, but the important bit is the
md.syntax = CS_OPT_SYNTAX_NOREGNAME
and that the instruction being decoded isb'\x04\x10\x9d\xe5',
(LDR r1,[r13, #4]
).Cause of the change
In v4, the decoding was performed by the
getRegisterName2
function for theCS_OPT_SYNTAX_NOREGRNAME
inARMGenAsmWriter.inc
, which for register id 12 (see above that the base register has the value of 12) we get out the stringsp
:https://github.com/capstone-engine/capstone/blob/v4/arch/ARM/ARMGenAsmWriter.inc#L8634C1-L8634C26
And in the v5 code, the decoding is performed by the
getRegisterName_digit
inARMGenRegisterName_digit.inc
, and again we use register id 12 (again the base register number is 12) which has a stringr13
.https://github.com/capstone-engine/capstone/blob/v5/arch/ARM/ARMGenRegisterName_digit.inc#L77
Obviously these two files are automatically generated, and arguably the use of
r13
when you're not using the register naming schemes is more accurate. However, except forAPCS_U
, register 13 has always been the stack pointer - I believe under APCS_U the stack pointer was inr12
, and unless you're using RISCiX you're not going to care about APCS_U. In all other cases, I believe r13 has the convention of being the stack pointer - and if you're interworking with Thumb, it must be a stack pointer.Expected behaviour
I expected the behaviour of the output to not change between versions, but it's not a strong expectation, as this is a major version update. It would have been nice if the change in register names had been included in the 5.0 change notes in https://github.com/capstone-engine/capstone/releases - just to be clear that it had updated.
What would be nice would be if it were possible to rename registers dynamically, but I suspect that's not going to be easy.
I intend to include a special case to rename
r13
tosp
when disassembling, to retain the old behaviour, if capstone 5 is detected, although I'm not convinced myself that this is a good idea in the long term - that's my problem, not yours.I just wanted to highlight that there is a change in behaviour and that it was unexpected. It's not necessarily a bug unless you are guaranteeing the output format is unchanging between major releases.
The text was updated successfully, but these errors were encountered: