Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Neo VM Bug]Fix negative counter issue #3304

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions src/Neo.VM/ReferenceCounter.cs
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@

using Neo.VM.StronglyConnectedComponents;
using Neo.VM.Types;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Runtime.CompilerServices;
using Buffer = Neo.VM.Types.Buffer;

namespace Neo.VM
{
Expand Down Expand Up @@ -115,7 +117,7 @@ internal int CheckZeroReferred()
tracked_items.Remove(item);
if (item is CompoundType compound)
{
references_count -= compound.SubItemsCount;
DecrementReferences(compound.SubItemsCount);
foreach (StackItem subitem in compound.SubItems)
{
if (component.Contains(subitem)) continue;
Expand All @@ -136,7 +138,7 @@ internal int CheckZeroReferred()

internal void RemoveReference(StackItem item, CompoundType parent)
{
references_count--;
DecrementReferences();
if (!NeedTrack(item)) return;
cached_components = null;
item.ObjectReferences![parent].References--;
Expand All @@ -146,10 +148,16 @@ internal void RemoveReference(StackItem item, CompoundType parent)

internal void RemoveStackReference(StackItem item)
{
references_count--;
DecrementReferences();
if (!NeedTrack(item)) return;
if (--item.StackReferences == 0)
zero_referred.Add(item);
}

private void DecrementReferences(int count = 1)
{
references_count -= count;
if (references_count < 0) throw new InvalidOperationException("Reference count is negative.");
}
Comment on lines +157 to +161
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is controversial. As I've said in #3301 (comment), normally reference counter is not expected to be negative. And if it's negative then it's a bug somewhere in handlers implementation (like some stackitem not being copied properly in #3301 or something like that) which leads to the negative reference counter.

We should fix the root of the problem, not the consequence of it. To me, it should be a denial of service in case of negative reference counter, not the FAULTed transaction. Because with the current version no one will ever know about the fact that node faced with the negative reference counter issue until we go through all FAULTed transactions and check their FAULT exceptions.

If we replace the current fix with DoS, then the node operator will know about the fact that the node does not count references properly and coredevs will be aware of this problem and need to find the exact place in the handlers code that leads to the negative refcount.

And finally, I'd say that the most suitable case for the uncatchable exception is in

protected virtual void PostExecuteInstruction(Instruction instruction)

Copy link
Contributor Author

@Jim8y Jim8y Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you mean clearly, but what i want to know is why cant we fix this problem?

ReferenceCounter is a standalone and complete class, non-negative should be its critical property from the very first place, why cant i check it?

We should fix the root of the problem, not the consequence of it.

One module should not rely on the correct usage to ensure its correct execution. Fixing deepcopy issue fix nothing at all, cause you have no idea if there is another place that can cause the refernce counter to be miss calculated.
So from my view, both reference counter and the deepcopy are the root of the problem. consequence is DOS attack.

If we replace the current fix with DoS, then the node operator will know about the fact that the node does not count references properly and coredevs will be aware of this problem and need to find the exact place in the handlers code that leads to the negative refcount.

To be honest, i dont quite understand. @shargon how do you think?

Copy link
Member

@AnnaShaleva AnnaShaleva Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why cant we fix this problem?

We can fix it, but not in this way.

why cant i check it?

We can check it. But what I suggest is to throw an uncatchable exception in case of a negative reference counter (the exception that won't be catched by VM and that will stop the node). Because this exception is a direct consequence of a bug in the implementation.

no idea if there is another place that can cause the refernce counter to be miss calculated.

And if one day this exception is triggered, then we need to go and find the root of it and fix it instead of hiding this problem under catchable exception that will FAULT the transaction and allow the node to continue blocks processing.

Copy link
Member

@AnnaShaleva AnnaShaleva Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer an exception instead of a denial of service

It's all about that, because in this case I prefer denial of service instead of FAULTed transaction. It's needed to be able to fix the real bug in the implementation if this problem with negative refcount happens again. Because it's not about VM limits, it's about bugs in the implementation.

@shargon, what do you think?

Copy link
Contributor Author

@Jim8y Jim8y Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what problem of this fix can cause? you think its not necessary or you think it should not being fixed? is negative counter being designed this way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what problem of this fix can cause?

This fix can cause the following problem: if there are other bugs in the node implementation that make VM counter negative, then we likely will never know about them. This fix hides the original problem, it makes the transaction FAULTed and almost no one from core devs tracks FAULTed transactions. How would we know that refcount may become negative in this case?

you think its not necessary or you think it should not being fixed?

Neither of it. I think it should be replaced with another solution.

is negative counter being designed this way?

No, and that's exactly the reason I suggest the denial of service instead of catchable exception that FAULTs transaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix can cause the following problem: if there are other bugs in the node implementation that make VM counter negative, then we likely will never know about them. This fix hides the original problem, it makes the transaction FAULTed and almost no one from core devs tracks FAULTed transactions. How would we know that refcount may become negative in this case?

Faulted is faulted, we got thousands of bugs in the core that can make transaction fault..... if it faults transactions then it wont cause serious problems, we have communities checking them, if they find it, they will report it, we have neogo nodes, they will find it and report it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jimmy, I see that we just have different preferences choosing between DoS and FAULTed transaction. Can we have some other opinions from the @neo-project/core? If other members agree with the current implementation, then probably my PoW isn't correct.

One thing that I'd like to highlight here about the current implementation: we need to check that at least in mainnet this fix does not cause changes in the node state. It may be possible if there is some other transaction in the mainnet history that makes refcount negative (due to some other reason than in #3301).

}
}
36 changes: 36 additions & 0 deletions tests/Neo.VM.Tests/UT_ReferenceCounter.cs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
using Microsoft.VisualStudio.TestTools.UnitTesting;
using Neo.VM;
using Neo.VM.Types;
using System;
using Array = Neo.VM.Types.Array;

namespace Neo.Test
{
Expand Down Expand Up @@ -240,5 +242,39 @@ public void TestArrayNoPush()
Assert.AreEqual(VMState.HALT, engine.Execute());
Assert.AreEqual(array.Count, engine.ReferenceCounter.Count);
}

[TestMethod]
public void TestNegativeReferenceCounter()
{
var referenceCounter = new ReferenceCounter();
var array = new Array(referenceCounter) { OnStack = true };
referenceCounter.AddStackReference(array);
for (var i = 0; i < 100; i++)
{
var item = new Integer(i) { OnStack = true };
array.Add(item);
}

Assert.AreEqual(101, referenceCounter.Count);
referenceCounter.CheckZeroReferred();
Assert.AreEqual(101, referenceCounter.Count);
var copyArray = array.DeepCopy(true);
Assert.AreEqual(201, referenceCounter.Count);
referenceCounter.CheckZeroReferred(); // Deepcopied value will be removed from the reference counter.
Assert.AreEqual(101, referenceCounter.Count);
// If we add the deepcopied value again, the reference counter will increase by only 1.
referenceCounter.AddStackReference(copyArray);
Assert.AreEqual(102, referenceCounter.Count);
referenceCounter.RemoveStackReference(copyArray);
// Removing the deepcopied value will decrease the reference counter by 101.
referenceCounter.CheckZeroReferred();
Assert.AreEqual(1, referenceCounter.Count);

// Bellow will trigger the negative reference counter exception
referenceCounter.AddStackReference(copyArray);
Assert.AreEqual(2, referenceCounter.Count);
referenceCounter.RemoveStackReference(copyArray);
Assert.ThrowsException<InvalidOperationException>(() => referenceCounter.CheckZeroReferred());
}
}
}