Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Neo VM Bug]Fix negative counter issue #3304

Closed
wants to merge 7 commits into from

Conversation

Jim8y
Copy link
Contributor

@Jim8y Jim8y commented Jun 7, 2024

Description

This pr fixes the negative reference counter issue.

Fixes # #3301 (comment)

Type of change

  • Optimization (the change is only an optimization)
  • Style (the change is only a code style for better maintenance or standard purpose)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • TestNegativeReferenceCounter

Test Configuration:

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

Jim8y and others added 3 commits June 8, 2024 00:06
…negative-counter

* 'fix-negative-counter' of github.com:Jim8y/neo:
  Fix error when vc_redist is missing (neo-project#3293)
@Jim8y Jim8y added the Critical Issues (bugs) that need to be fixed ASAP label Jun 8, 2024
Copy link
Member

@shargon shargon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer an exception instead of a denial of service, but this could be solved using memory instead of reference counter neo-project/neo-vm#202

@Jim8y
Copy link
Contributor Author

Jim8y commented Jun 8, 2024

I prefer an exception instead of a denial of service, but this will be solved using memory instead of reference counter neo-project/neo-vm#202

@shargon is a good point, should we catch this exception in vm and fault the execution?

@shargon
Copy link
Member

shargon commented Jun 8, 2024

I prefer an exception instead of a denial of service, but this will be solved using memory instead of reference counter neo-project/neo-vm#202

@shargon is a good point, should we catch this exception in vm and fault the execution?

With this patch it will fault, isn't it?

@Jim8y
Copy link
Contributor Author

Jim8y commented Jun 8, 2024

I prefer an exception instead of a denial of service, but this will be solved using memory instead of reference counter neo-project/neo-vm#202

@shargon is a good point, should we catch this exception in vm and fault the execution?

With this patch it will fault, isn't it?

Surely it will, checked, its captured and faulted.

Comment on lines +157 to +161
private void DecrementReferences(int count = 1)
{
references_count -= count;
if (references_count < 0) throw new InvalidOperationException("Reference count is negative.");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is controversial. As I've said in #3301 (comment), normally reference counter is not expected to be negative. And if it's negative then it's a bug somewhere in handlers implementation (like some stackitem not being copied properly in #3301 or something like that) which leads to the negative reference counter.

We should fix the root of the problem, not the consequence of it. To me, it should be a denial of service in case of negative reference counter, not the FAULTed transaction. Because with the current version no one will ever know about the fact that node faced with the negative reference counter issue until we go through all FAULTed transactions and check their FAULT exceptions.

If we replace the current fix with DoS, then the node operator will know about the fact that the node does not count references properly and coredevs will be aware of this problem and need to find the exact place in the handlers code that leads to the negative refcount.

And finally, I'd say that the most suitable case for the uncatchable exception is in

protected virtual void PostExecuteInstruction(Instruction instruction)

Copy link
Contributor Author

@Jim8y Jim8y Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you mean clearly, but what i want to know is why cant we fix this problem?

ReferenceCounter is a standalone and complete class, non-negative should be its critical property from the very first place, why cant i check it?

We should fix the root of the problem, not the consequence of it.

One module should not rely on the correct usage to ensure its correct execution. Fixing deepcopy issue fix nothing at all, cause you have no idea if there is another place that can cause the refernce counter to be miss calculated.
So from my view, both reference counter and the deepcopy are the root of the problem. consequence is DOS attack.

If we replace the current fix with DoS, then the node operator will know about the fact that the node does not count references properly and coredevs will be aware of this problem and need to find the exact place in the handlers code that leads to the negative refcount.

To be honest, i dont quite understand. @shargon how do you think?

Copy link
Member

@AnnaShaleva AnnaShaleva Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why cant we fix this problem?

We can fix it, but not in this way.

why cant i check it?

We can check it. But what I suggest is to throw an uncatchable exception in case of a negative reference counter (the exception that won't be catched by VM and that will stop the node). Because this exception is a direct consequence of a bug in the implementation.

no idea if there is another place that can cause the refernce counter to be miss calculated.

And if one day this exception is triggered, then we need to go and find the root of it and fix it instead of hiding this problem under catchable exception that will FAULT the transaction and allow the node to continue blocks processing.

Copy link
Member

@AnnaShaleva AnnaShaleva Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer an exception instead of a denial of service

It's all about that, because in this case I prefer denial of service instead of FAULTed transaction. It's needed to be able to fix the real bug in the implementation if this problem with negative refcount happens again. Because it's not about VM limits, it's about bugs in the implementation.

@shargon, what do you think?

Copy link
Contributor Author

@Jim8y Jim8y Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what problem of this fix can cause? you think its not necessary or you think it should not being fixed? is negative counter being designed this way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what problem of this fix can cause?

This fix can cause the following problem: if there are other bugs in the node implementation that make VM counter negative, then we likely will never know about them. This fix hides the original problem, it makes the transaction FAULTed and almost no one from core devs tracks FAULTed transactions. How would we know that refcount may become negative in this case?

you think its not necessary or you think it should not being fixed?

Neither of it. I think it should be replaced with another solution.

is negative counter being designed this way?

No, and that's exactly the reason I suggest the denial of service instead of catchable exception that FAULTs transaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix can cause the following problem: if there are other bugs in the node implementation that make VM counter negative, then we likely will never know about them. This fix hides the original problem, it makes the transaction FAULTed and almost no one from core devs tracks FAULTed transactions. How would we know that refcount may become negative in this case?

Faulted is faulted, we got thousands of bugs in the core that can make transaction fault..... if it faults transactions then it wont cause serious problems, we have communities checking them, if they find it, they will report it, we have neogo nodes, they will find it and report it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jimmy, I see that we just have different preferences choosing between DoS and FAULTed transaction. Can we have some other opinions from the @neo-project/core? If other members agree with the current implementation, then probably my PoW isn't correct.

One thing that I'd like to highlight here about the current implementation: we need to check that at least in mainnet this fix does not cause changes in the node state. It may be possible if there is some other transaction in the mainnet history that makes refcount negative (due to some other reason than in #3301).

@Jim8y Jim8y closed this Jun 11, 2024
@Jim8y Jim8y deleted the fix-negative-counter branch June 11, 2024 06:14
@vang1ong7ang
Copy link
Contributor

I agree with @AnnaShaleva

We should fix the root of the problem, not the consequence of it. To me, it should be a denial of service in case of negative reference counter, not the FAULTed transaction.

@AnnaShaleva
Copy link
Member

I agree with @AnnaShaleva

@vang1ong7ang, do you think we need to stop the node explicitly in case of negative reference counter?

@AnnaShaleva
Copy link
Member

do you think we need to stop the node explicitly in case of negative reference counter?

OK, I see, Jimmy has created another PR for that.

@Jim8y Jim8y restored the fix-negative-counter branch June 11, 2024 07:07
@Jim8y Jim8y reopened this Jun 11, 2024
@Jim8y Jim8y removed this from the v3.7.5 milestone Jun 11, 2024
@vang1ong7ang
Copy link
Contributor

@vang1ong7ang, do you think we need to stop the node explicitly in case of negative reference counter?

@AnnaShaleva yes if it happens i suggest to killall

@Jim8y Jim8y closed this Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Used to tag confirmed bugs Critical Issues (bugs) that need to be fixed ASAP Waiting for Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants