Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ethereum/tests fixtures #1224

Merged
merged 5 commits into from
Sep 10, 2018

Conversation

veox
Copy link
Contributor

@veox veox commented Aug 29, 2018

What was wrong?

See issue #1152, in particular #1152 (comment).

How was it fixed?

Incrementally:

WIP: current state

Cute Animal Picture

"Love this fixture..."

Cat hugs lamp shade

Source: eleejhon @ imgur

Rolls forwards up to 2018-01-22.

Passing of test_state_fixtures determined using `git bisect run`.
@veox veox changed the title WIP: update ethereum/tests fixtures [WIP] Update ethereum/tests fixtures Aug 30, 2018
@veox
Copy link
Contributor Author

veox commented Aug 30, 2018

The test I'm evaluating currently was introduced in ethereum/tests#405 - you can try it by git checkout 9b1f07c58a70d1b17c4489c49eb9bebf4a27d290 in fixtures.

I couldn't find a rationale for it, but looks like it deals with the MODEXP precompile ("contact" 0x5, introduced in Byzantium, EIP-198), which computes (B**E)%M; and the particular case where B==1, E==1, M==<a 999188-byte long value 0x0100..0000>.

EDIT: So, the purpose of this test is likely: the precompile allocates and returns the correct amount of memory, given a very big modlen value.

Running

pytest --cache-clear --capture=no -k "test_state_fixtures[/home/veox/src/py-evm/fixtures/GeneralStateTests/stReturnDataTest/modexp_modsize0_returndatasize.json:modexp_modsize0_returndatasize:Byzantium:4]"

fails on

assert block.header.state_root == fixture['post']['hash']

The helper contract puts some results of the call to the precompile into storage, and that produces a mismatch in the state.

ATM, still looking for the culprit.

@pipermerriam
Copy link
Member

Maybe we can uncover what is wrong with opcode level test cases assuming we know what the expected result is for the inputs you mentioned?

@pipermerriam
Copy link
Member

FYI, the mismatched state root failure case is hard to debug. It normally requires enabling TRACE level output, running the single test and inspecting the logging output, but in this case, it's likely different because of the result being wrong and thus the storage entry that the contract is storing being wrong.

Also maybe look to see if there are VM level tests for this as they would provide explicit output for the call.

@veox
Copy link
Contributor Author

veox commented Aug 31, 2018

This comment (of mine) is incorrect.

Was inspecting EVM memory, but data returned from the precompile call is stored in returndata, and never copied to EVM memory in its entirety.

The EVM memory footprint is correct: a 32-byte value needs to be written to location 1000 (decimal), which overlaps the 1024 boundary, so memory expansion gives a total of 1056 (that's 1024+32) bytes.


Took a glance y-day, there is a modexp test file, but no test case for this is particular.

Tried inspecting the computation, the result seemed wrong - 0, not 1, and maybe truncated: since the output should be as long as the modulo argument, the memory footprint should have been a megabyte or so, but was only 1056 bytes.

computation (as returned by vm.apply_transaction()) should have memory as it existed when exiting the EVM call frame, right?..

@veox
Copy link
Contributor Author

veox commented Aug 31, 2018

I think the computation fails to allocate additional 999188 bytes in memory for the returndata (or something like that). That is, the failure might be coming from the precompile impl-n.

EDIT: no, the above is incorrect. There is no failure to provide returndata. See two comments later.

The contract in the test SSTOREs three 32-byte values:

  • whatever is left on the stack after CALLCODEing the precompile (0 for failure, 1 for success);
  • whatever is at memory location 0x03e8 (hex for decimal 1000); and
  • value of RETURNDATASIZE.

(These are written to storage locations 0x01 to 0x03 of the contract.)

IIUC, 0x03e8 is where the precompile is instructed to put the result. The value there is likely to be 0 - it's the start of 999188-byte-long 0x0000..0001.

I haven't gotten to inspecting actual storage post-computation - trying to find out how...


For ref, disassembly of the helper contract:

000000: CALLDATASIZE
000001: PUSH1 0x00
000003: PUSH1 0x00
000005: CALLDATACOPY
000006: PUSH1 0x20
000008: PUSH2 0x03e8
000011: CALLDATASIZE
000012: PUSH1 0x00
000014: PUSH1 0x00
000016: PUSH1 0x05
000018: GAS
000019: CALLCODE
000020: PUSH1 0x01
000022: SSTORE
000023: PUSH2 0x03e8
000026: MLOAD
000027: PUSH1 0x02
000029: SSTORE
000030: RETURNDATASIZE
000031: PUSH1 0x03
000033: SSTORE

For ref also - LLL source:

{ (CALLDATACOPY 0 0 (CALLDATASIZE))
  [[1]] (CALLCODE (GAS) 5 0 0 (CALLDATASIZE) 1000 32)
  [[2]](MLOAD 1000)
  [[3]](RETURNDATASIZE) }

@pipermerriam
Copy link
Member

Have you looked at the TRACE level logging output. my guess is that it'll go OOG when it tries to allocate such a large memory size.

@veox
Copy link
Contributor Author

veox commented Sep 1, 2018

EDIT: Eventually, this turned out to be the culprit. Gas calculation was wrong in py-evm: it should be much, much higher.


OK, changed logging level in tests/conftest.py; gas consumption for calling the precompile is 23970528 (that's 23.97 million); tx gas allowance is 1 billion, and block limit is set at 10 billion for that test case - so no OOG.

I haven't checked yet if 24023296 is the precise expected gas consumption for the transaction (the test case seems to be under-specified in this regard); the values put to storage, according to the TRACE:

  • 1
  • 0
  • 999188

JIC, here's a gist with TRACE output for the case. (Note: I'm doing some debug-print there, and removed the irrelevant lines, except from the variable list (e.g. memcontents is me inspecting memory).

@veox
Copy link
Contributor Author

veox commented Sep 1, 2018

EDIT: In this comment, I'm confused, thinking two different computations are the same. Maybe skip it.


What's more confusing to me is, doing this printf-style debugging, I can see that the return value length from modexp() (the precompile impl-n) is indeed 999188, but

print('computation.should_erase_return_data:', computation.should_erase_return_data)
print('len(computation._output):', len(computation._output))

at the end of test_state_fixtures() gives False and 0.

So, seems like somewhere between the result being returned from modexp() and the computation being returned from vm.apply_transaction(), computation._output gets truncated...

@veox
Copy link
Contributor Author

veox commented Sep 1, 2018

EDIT: The below is also incorrect. The existing value is specified in the test JSON. :(


Maybe found it?.. When writing 999188 to storage slot 3 of the contract, there seems to be a value 4294967295 already there; so the gas cost is 5000 instead of 20000.

(EDIT: Close accidental, sorry.)

@veox
Copy link
Contributor Author

veox commented Sep 2, 2018

I've been stuck on this single test case for too long, and, to the best of my understanding, py-evm's operation is correct.

My inclination is to mark this test xfail (¯\_(ツ)_/¯), and move on.

Brought up my findings in ethereum/tests#405 (where it was incuded), and requested more info.

@pipermerriam
Copy link
Member

Thanks for all the work you've put into this. I think xfail is fine for now but we'll want to do a full re-visit of any xfail tests before constantinople because it's looking like we'll have a minimally viable client by around that time.

The upstream generated test is not sufficiently specific, and it's hard
to determine which of the two implementations is incorrect.

The principal author of the test case (Yoichi Hirai, github @pirapira)
seems currently unavailable, so it's difficult for me to get specific
details.
@veox veox force-pushed the update-ethereum-test-fixtures branch from 1d8d470 to b11a7b8 Compare September 4, 2018 11:14
@veox
Copy link
Contributor Author

veox commented Sep 4, 2018

Squashed/rebased previous linting failure.

I was unable to apply the same INCORRECT_UPSTREAM_TESTS trick to BlockchainTests, so one CI job may still fail.


Also, accidentally found a typo in py-evm that caused the modexp failure. After all, upstream was correct.

Running full state tests locally while CircleCI checks linting, will submit afterwards.

There was a typo in the "complexity" calculation function, a
special interim value from EIP-198 used to determine total gas
use.

The code path was never previously exercised.

NOTE: previous-commit "fixtures bump" was to commit:

9b1f07c58a70d1b17c4489c49eb9bebf4a27d290

Squashed commit:

tests: update "very big number" in test_modexp_gas_fee_calculation().

... and also fix that test's name, from "calculTation".

The very-big-number is not actually in EIP-198; the latter has
this to say:

> it’s not possible to provide enough gas to make that computation.

That's a bit cryptic, but the gist is that the most that can be
represented in a 256-bit number is

0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

which is

115792089237316195423570985008687907853269984665640564039457584007913129639935

and that's less than the

10684346944173007063723051170445283632835119638284563472873463025465780712173320789629146724657549280936306536701227228889744512638312451529980055895215896

required by this vector, or even the (erroneous)

708647586132375115992254428253169996062012306153720251921480414128428353393856280

that was in the test previously.
@veox veox force-pushed the update-ethereum-test-fixtures branch from e93255c to 7752f05 Compare September 4, 2018 13:46
@veox
Copy link
Contributor Author

veox commented Sep 4, 2018

Evaluation of the next failure - that unearthed by ethereum/tests#419 - starts here.

Try it using cd fixtures && git checkout 61185fe4b8762118fe9ee318539683b47cb04ed6.


LLL helper contract:

{ (seq
    (address)
    (CREATE 0 0 (lll
                  (seq
                    (mstore 0 0x112233)
                    (revert 0 32))
                  0))
    (SSTORE 0 (RETURNDATASIZE))
    (returndatacopy 0 0 32)
    (sstore 1 (mload 0)) ) }

Its disassembly:

000000: ADDRESS
000001: POP
000002: PUSH1 0x0c
000004: DUP1
000005: PUSH1 0x22  ;; 0x22 hex == 34 decimal
000007: PUSH1 0x00
000009: CODECOPY
000010: PUSH1 0x00
000012: PUSH1 0x00
000014: CREATE
000015: POP
000016: RETURNDATASIZE
000017: PUSH1 0x00
000019: SSTORE
000020: PUSH1 0x20
000022: PUSH1 0x00
000024: PUSH1 0x00
000026: RETURNDATACOPY
000027: PUSH1 0x00
000029: MLOAD
000030: PUSH1 0x01
000032: SSTORE
000033: STOP
000034: PUSH3 0x112233
000038: PUSH1 0x00
000040: MSTORE
000041: PUSH1 0x20
000043: PUSH1 0x00
000045: REVERT

TRACE log.


Expected values at storage ({0: 0x20, 1: 0x112233}):

https://github.com/ethereum/tests/blob/61185fe4b8762118fe9ee318539683b47cb04ed6/src/GeneralStateTestsFiller/stRevertTest/RevertInCreateInInitFiller.json#L31-L39

TRACE says those are indeed the values stored (see link above, towards the end of the file).

@veox
Copy link
Contributor Author

veox commented Sep 4, 2018

Candidate, narrowed down to a difference in gas usage:

The difference is 15000, which could suggest a miscalculation of gas use on SSTORE by one of the implementations.

@pipermerriam
Copy link
Member

@veox not sure what commit you are on, but the net gas metering for SSTORE hasn't been implemented yet so if those are constantinople tests that could be the case.

If not, the simple case is a small miscalculation, but I've found that often the gas usage difference is actually indicative of a previous error where either a value should not have been written to storage (but it was) and thus a subsequent write to that storage slot costs less than it should, or some other variant of the same thing.

@veox
Copy link
Contributor Author

veox commented Sep 5, 2018

@pipermerriam Thanks for the suggestion! I'm trying not to assume things ("15000 therefore SSTORE" is also just a guess).

This is a Byzantium test, so net gas metering isn't in play yet.

I've edited the relevant previous comment to indicate the commit used, for the sake of easier reproduction.

@veox
Copy link
Contributor Author

veox commented Sep 5, 2018

OK, here's what I think is going on in this test. I might still be incorrect, as the execution TRACE lacks a couple of details.


There is an account 0x6295ee1b4f6dd65047762f924ecd367c17eabf8f with empty code, nonce 0, but with a storage location 0x00 already set to a non-zero value 0x01. (How? Test case magic!)

A call is made by another account, 0xa94f5374fce5edbc8e2a8697c15331677e6ebf0b, with a CREATE that would place code at 0x6295..bf8f... except the inner CREATE call (think "Solidity constructor") always REVERTs.

At this point - question: what is the value at storage location 0x00 of 0x6295..bf8f?

  • According to the test case (i.e. then-cpp-ethereum, now-aleth that generated the test); probably also geth and older parity, - the storage is empty. So the two writes to empty storage consume 20000 gas each.
    • EDIT: Might be that nonce of 0x6295..bf8f also gets bumped to 1, but haven't verified this.
  • According to py-evm, the storage looks as it has at the beginning of the test - that is, location 0x00 has value 0x01. So, the writes consume 5000 and 20000 gas for locations 0x00 and 0x01 respectively.
    • EDIT: Chances are, nonce of 0x6295..bf8f is also still at 0, but haven't verified this.

(Irrelevant section snipped on 2018-12-11.)

@carver
Copy link
Contributor

carver commented Sep 5, 2018

Is the tl;dr that a CREATE call should always start by wiping out the storage at the address (and that emptying the storage is not reverted on a REVERT)?

(nice bug hunting btw)

@holiman
Copy link

holiman commented Sep 5, 2018

See ethereum/EIPs#684

@veox
Copy link
Contributor Author

veox commented Sep 5, 2018

@carver

Is the tl;dr that a CREATE call should always start by wiping out the storage at the address (and that emptying the storage is not reverted on a REVERT)?

It would seem so; except the yellowpaper (Section 7, "Contract Creation") either explicitly states otherwise, is contradictory, or is under-specified:

The account’s nonce is initially defined as one, the balance as the value passed, the storage as empty (...)

But then later:

execution may exit before the code has come to a natural halting state. In this (and several other) exceptional cases we say an out-of-gas (OOG) exception has occurred: The evaluated state is defined as being the empty set, ∅, and the entire create operation should have no effect on the state, effectively leaving it as it was immediately prior to attempting the creation.

The YP does not consider a "synthetic" case, like in this test.


@holiman (near-copy from gitter chat)

It seems that ethereum/EIPs#684 considers {some-nonce, some-code, unspecified-storage} accounts. This case is the somewhat-stricter opposite: account creation is attempted on an existing {zero-nonce, empty-code, some-storage} account.

The proposition is somewhat ludicrous, but here we are!..

I assume the test case is being run, and passes against all the implementations handled by Hive?..

@pipermerriam
Copy link
Member

@veox it appears you've got a handle on the why (your explanation makes sense to me). Let me know if you need guidance on how to proceed with fixing things.

@veox
Copy link
Contributor Author

veox commented Sep 5, 2018

@pipermerriam Cool!

Definitely confused here, might take a day or two to formulate something concise.


It seems to me that the test case is in contradiction to the yellowpaper. If so, then all the major implementations may be as well (need to verify). If so, then an exception/clarification to the yellowpaper might be needed; possibly complemented by a "correcting" EIP for Constantinople/Istanbul...

If my guesses above are correct, and aleth/geth/parity implement it this way, then I see no recourse but to implement it the same way also, at least for the Byzantium block dynasty.

Would be nice to get confirmation from these other implementations... Brought it up on gitter a few hours ago, where @holiman jumped in from.

@veox veox force-pushed the update-ethereum-test-fixtures branch 2 times, most recently from 29f11ac to e02123e Compare September 7, 2018 12:40
@veox
Copy link
Contributor Author

veox commented Sep 7, 2018

TL;DR: I won't be fixing this one. q:D


The "correct" approach from the very beginning would have been immediately marking the test xfail when bumping fixtures. This highlights that a specific upstream test is failing, and makes sense no matter where the breakage originates from. I've done this now in e02123e. (EDIT: Now became part of commit 11d61b0.)

As to the test itself - IMO this really should be fixed upstream. Leaving a newly-CREATEd contract's state wiped after a REVERT contradicts the yellowpaper.

This difference between py-evm and (probably) every other implementation is unfortunate. However, I think it doesn't present a danger of consensus failure on its own, as the initial state that the test starts with can't (currently) be arrived at "naturally", using regular consensus rules. (This is why I called it a "synthetic" test.)

However, it seems that this particular detail is a quirk in the test definition; it's probably testing for something else. By marking the test xfail, py-evm is no longer testing for that, too (but, note: only on Byzantium; the earlier chains are still being tested).

If someone's inclined, they can push py-evm's behaviour towards that in the test, but I think this is wrong.


A-a-and a test still fails on CircleCI. Will investigate another day.

@sorpaas
Copy link

sorpaas commented Sep 8, 2018

We recently fixed a bug for checkpointing in openethereum/parity-ethereum#9319, and I believe right now, parity's behavior on this is the same as py-evm -- on REVERT or out-of-gas, old storage values should re-appear, if that account ever existed.

@veox veox force-pushed the update-ethereum-test-fixtures branch from 58b8192 to bdfba46 Compare September 9, 2018 07:23
…evertInCreateInInit as xfail.

Rolls forwards up to 2018-03-01.

Break in `RevertInCreateInInit.json` determined by
`git bisect run`.

The test is marked `xfail` to expicitly highlight the fact.

This is done in 3 places - all are run as part of CI.
@veox veox force-pushed the update-ethereum-test-fixtures branch from bdfba46 to 11d61b0 Compare September 9, 2018 08:25
Determined state tests as "good" by `git bisect run`.
@veox
Copy link
Contributor Author

veox commented Sep 9, 2018

Current state might be where the "simple" fixes end.

The next merge commit in ethereum/fixtures gives 74 failures for test_state_fixtures() alone, IMO better handled in a separate PR. Also, the discussion here is getting a bit too long, and hard to navigate.

Unmarking as [WIP].

Should be OK to squash-merge, especially since there's an interim commit that failed CI.

@veox veox changed the title [WIP] Update ethereum/tests fixtures Update ethereum/tests fixtures Sep 9, 2018
@pipermerriam pipermerriam merged commit d521ea2 into ethereum:master Sep 10, 2018
andresilva pushed a commit to openethereum/parity-ethereum that referenced this pull request Sep 12, 2018
…e-appear (#9532)

Because more tests won't hurt. :)

Add a test case for ethereum/py-evm#1224 (comment) where if contract creation fails, old storage values (if ever existed) should re-appear.
@veox veox deleted the update-ethereum-test-fixtures branch September 12, 2018 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants