Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure x0 is not modified in mops #403

Merged
merged 4 commits into from
Feb 23, 2024

Conversation

mohanson
Copy link
Collaborator

No description provided.

@@ -1875,6 +1875,7 @@ ckb_vm_x64_execute:
adc TEMP1, TEMP1, TEMP1
WRITE_RD(TEMP3)
WRITE_RS3(TEMP1)
str ZERO_VALUE, ZERO_ADDRESS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this solution, here we are basically introducing 2 patterns:

  • WRITE_RD clears x0 itself
  • For opcode that involves WRITE_RS3(and possibly WRITE_RS1, WRITE_RS2), a manual line to clear x0 is used here

That essentially mean we will have 2 ways to do the same thing, which is hard to enforce. For example: the impl for SBB, which is immediately above this code here, also has a WRITE_RS3(TEMP1), but we are not clearing the x0 value. Yes for now TEMP1 is not x0 so we don't need to do it here. But going into the future, this knowledge can be easily ignored, leading to more bugs.

Personally, I see 2 ways of fixing the code here:

  • WRITE_RS1, WRITE_RS2, WRITE_RS3, and WRITE_RD should all clear x0 at the end
  • We can move the line to clear x0 to NEXT_INST macro, which will ensure all writes to x0 gets cleared when an opcode finishes processing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can distinguish between NEXT_INST, which does not clear x0, and could be used for insts like SD, and NEXT_INST_COMPLETING_REG_WRITES, which clears x0, and could be used for insts that writes to at least one register

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll talk about the benefits of doing this. Because I want to limit the scope of code changes to the 5 MOP instructions introduced in VERSION2, so we can ensure that this won't introduce new bugs in VERSION0 and VERSION1.

You mentioned 3 possible solutions, the third one has too broad of a scope, I consider it more of a refactoring than a bug fix. Among the first two, I prefer the first one, as it aligns with the handling in the Rust version of CKB-VM https://github.com/nervosnetwork/ckb-vm/blob/develop/src/instructions/utils.rs#L77-L89.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rewrite the code according to the first solution. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll talk about the benefits of doing this. Because I want to limit the scope of code changes to the 5 MOP instructions introduced in VERSION2, so we can ensure that this won't introduce new bugs in VERSION0 and VERSION1.

There is actually a second reason why I preferred a more complete solution: it's not just the 5 new MOP instructions, correct me if I'm wrong, I think WRITE_RS3 is already used for MOP instructions in VERSION1 at least. That brings a different question: does a similar bug exists in VERSION1 as well? The Rust interpreter, due to its use of utils::update_register utility, is free from this bug, should be correct. But can be be sure VERSION1's code is free from this bug? This is something I would suggest we go back and give it a check.

I do remember back in the days, when we first discovered bugs in CKB-VM, we introduced a version-based design, and changed the code to replicate the exact same bug in the Rust interpreter to behave the same as the assembly interpreter. Here I think we should do the same: if VERSION1 has the same bug, we should replicate the exact same behavior in the Rust interpreter of VERSION1, and really fix the bug in VERSION2.

So if you really ask my opinion, I would do a more comprehensive check and go with solution 3. It's a more future proof reason and also can deliver slightly better performance. However if you want to go with solution 1 I'm okay but still recommend checking if existing MOP instructions are affected by similar bug

Copy link
Collaborator Author

@mohanson mohanson Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solution 3 is best applied to the develop branch first; do it on the release branch to fix this bug is too risky. That's my main concern. So, I'll use solution 1 as a short-term solution to this problem. In the long run, I agree with you; we need to carefully reassess this logic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really the same thing to me: you could do solution 3, but only use NEXT_INST_COMPLETING_REG_WRITES on newer mops introduced in version 2. We should keep version 1's current behavior anyway.

By using solution 1, we are essentially changing the semantics of WRITE_RS3, etc. I can raise the same question: code in version 1 also uses WRITE_RS3, by changing the underlying code behind WRITE_RS3, are you essentially changing the behavior of version 1's implementation?

Still, the same additional question arises: is version 1 truly unaffected by the same bug?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that means we already have a bug, to me a proper way of doing this is:

  • Recognizing the current x64 assembly implementation as the canonical one, even though there might be bugs in it, consensus defines that this implementation is the de-facto one
  • Changing the Rust & aarch64 implementation to replicate the exact same behavior(tho buggy it is) as the x64 one in version 1
  • In version 2, we fix the bug in all 3 implementations

@mohanson mohanson merged commit b7da390 into nervosnetwork:develop Feb 23, 2024
11 checks passed
@mohanson mohanson deleted the fix_mop branch February 23, 2024 05:36
mohanson added a commit to libraries/ckb-vm that referenced this pull request Feb 23, 2024
* Make sure x0 is not modified in mops

* Ensure aarch64 behavior is consistent with x86

* Refactor

* Quick fix previous commit
mohanson added a commit to libraries/ckb-vm that referenced this pull request Feb 23, 2024
* Make sure x0 is not modified in mops

* Ensure aarch64 behavior is consistent with x86

* Refactor

* Quick fix previous commit
mohanson added a commit that referenced this pull request Feb 23, 2024
* Make sure x0 is not modified in mops

* Ensure aarch64 behavior is consistent with x86

* Refactor

* Quick fix previous commit
mohanson added a commit that referenced this pull request Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants