-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Neo VM Bug]Fix negative counter issue #3304
Conversation
…negative-counter * 'fix-negative-counter' of github.com:Jim8y/neo: Fix error when vc_redist is missing (neo-project#3293)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer an exception instead of a denial of service, but this could be solved using memory instead of reference counter neo-project/neo-vm#202
@shargon is a good point, should we catch this exception in vm and fault the execution? |
With this patch it will fault, isn't it? |
Surely it will, checked, its captured and faulted. |
private void DecrementReferences(int count = 1) | ||
{ | ||
references_count -= count; | ||
if (references_count < 0) throw new InvalidOperationException("Reference count is negative."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is controversial. As I've said in #3301 (comment), normally reference counter is not expected to be negative. And if it's negative then it's a bug somewhere in handlers implementation (like some stackitem not being copied properly in #3301 or something like that) which leads to the negative reference counter.
We should fix the root of the problem, not the consequence of it. To me, it should be a denial of service in case of negative reference counter, not the FAULTed transaction. Because with the current version no one will ever know about the fact that node faced with the negative reference counter issue until we go through all FAULTed transactions and check their FAULT exceptions.
If we replace the current fix with DoS, then the node operator will know about the fact that the node does not count references properly and coredevs will be aware of this problem and need to find the exact place in the handlers code that leads to the negative refcount.
And finally, I'd say that the most suitable case for the uncatchable exception is in
neo/src/Neo.VM/ExecutionEngine.cs
Line 282 in 38cc0e9
protected virtual void PostExecuteInstruction(Instruction instruction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you mean clearly, but what i want to know is why cant we fix this problem?
ReferenceCounter is a standalone and complete class, non-negative should be its critical property from the very first place, why cant i check it?
We should fix the root of the problem, not the consequence of it.
One module should not rely on the correct usage to ensure its correct execution. Fixing deepcopy issue fix nothing at all, cause you have no idea if there is another place that can cause the refernce counter to be miss calculated.
So from my view, both reference counter and the deepcopy are the root of the problem. consequence is DOS attack.
If we replace the current fix with DoS, then the node operator will know about the fact that the node does not count references properly and coredevs will be aware of this problem and need to find the exact place in the handlers code that leads to the negative refcount.
To be honest, i dont quite understand. @shargon how do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why cant we fix this problem?
We can fix it, but not in this way.
why cant i check it?
We can check it. But what I suggest is to throw an uncatchable exception in case of a negative reference counter (the exception that won't be catched by VM and that will stop the node). Because this exception is a direct consequence of a bug in the implementation.
no idea if there is another place that can cause the refernce counter to be miss calculated.
And if one day this exception is triggered, then we need to go and find the root of it and fix it instead of hiding this problem under catchable exception that will FAULT the transaction and allow the node to continue blocks processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer an exception instead of a denial of service
It's all about that, because in this case I prefer denial of service instead of FAULTed transaction. It's needed to be able to fix the real bug in the implementation if this problem with negative refcount happens again. Because it's not about VM limits, it's about bugs in the implementation.
@shargon, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what problem of this fix can cause? you think its not necessary or you think it should not being fixed? is negative counter being designed this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what problem of this fix can cause?
This fix can cause the following problem: if there are other bugs in the node implementation that make VM counter negative, then we likely will never know about them. This fix hides the original problem, it makes the transaction FAULTed and almost no one from core devs tracks FAULTed transactions. How would we know that refcount may become negative in this case?
you think its not necessary or you think it should not being fixed?
Neither of it. I think it should be replaced with another solution.
is negative counter being designed this way?
No, and that's exactly the reason I suggest the denial of service instead of catchable exception that FAULTs transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix can cause the following problem: if there are other bugs in the node implementation that make VM counter negative, then we likely will never know about them. This fix hides the original problem, it makes the transaction FAULTed and almost no one from core devs tracks FAULTed transactions. How would we know that refcount may become negative in this case?
Faulted is faulted, we got thousands of bugs in the core that can make transaction fault..... if it faults transactions then it wont cause serious problems, we have communities checking them, if they find it, they will report it, we have neogo nodes, they will find it and report it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jimmy, I see that we just have different preferences choosing between DoS and FAULTed transaction. Can we have some other opinions from the @neo-project/core? If other members agree with the current implementation, then probably my PoW isn't correct.
One thing that I'd like to highlight here about the current implementation: we need to check that at least in mainnet this fix does not cause changes in the node state. It may be possible if there is some other transaction in the mainnet history that makes refcount negative (due to some other reason than in #3301).
I agree with @AnnaShaleva
|
@vang1ong7ang, do you think we need to stop the node explicitly in case of negative reference counter? |
OK, I see, Jimmy has created another PR for that. |
@AnnaShaleva yes if it happens i suggest to killall |
Description
This pr fixes the negative reference counter issue.
Fixes # #3301 (comment)
Type of change
How Has This Been Tested?
Test Configuration:
Checklist: