Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support for snapshot cheatcodes #427

Merged
merged 13 commits into from
Dec 17, 2024
Merged

feat: support for snapshot cheatcodes #427

merged 13 commits into from
Dec 17, 2024

Conversation

daejunpark
Copy link
Collaborator

@daejunpark daejunpark commented Dec 11, 2024

support for state and storage snapshots:

  • svm.snapshotStorage(address)
  • vm.snapshotState()

note: two structurally identical symbolic storages/states are given the same snapshot id, but those that are semantically equivalent may not be assigned the same snapshot id. (see SnapshotTest for examples.)

Copy link
Collaborator

@karmacoma-eth karmacoma-eth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incredibly straightforward 👌

@daejunpark daejunpark changed the title feat: support for snapshotStorage(address) cheatcode feat: support for snapshot cheatcodes Dec 12, 2024
"""
Generates a snapshot ID by hashing the current state (code, storage, and balance).
"""
m = hashlib.sha3_256()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we don't need the hash to be cryptographically secure for this, we'd be better off using a high quality regular hash function like xxh3_64 from https://pypi.org/project/xxhash/, about 10x faster:

# built-in sha3_256, not keccak
$ python -m timeit --setup="from hashlib import sha3_256" "sha3_256(b'').digest()"
500000 loops, best of 5: 892 nsec per loop

# xxh64, requires `pip install xxhash`
$ python -m timeit --setup="from xxhash import xxh64_digest" "xxh64_digest(b'')"
2000000 loops, best of 5: 101 nsec per loop

# xxh3_64, requires `pip install xxhash`
$ python -m timeit --setup="from xxhash import xxh3_64_digest" "xxh3_64_digest(b'')"
5000000 loops, best of 5: 98.3 nsec per loop

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for suggestion!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt? 7d08e7a

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency I think it would be better to avoid xxh3_128 and use xxh3_64 only. Plus that way we have an extra 64 bit reserved for some future use!

Copy link
Collaborator Author

@daejunpark daejunpark Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering 128 bits for storages to be more conservative. (Storages are more frequently updated than code and balances.) With 64 bits, the hash collision probability exceeds 10^-12 when generating 6,000 hashes (which is large, but not unrealistic when running halmos in cloud several days.) With 128 bits, however, it would take 20 trillion hashes to reach the 10^-12 collision probability.

That said, 64 bits might still be good enough for our use case. But, in that case, I'm not sure what's the benefit of using xxhash vs the builtin hash(). The builtin hash() can take arbitrary objects as input, and also doesn't seem slower than xxhash:

$ python -m timeit --setup="from xxhash import xxh3_64_digest" "xxh3_64_digest(b'')"
5000000 loops, best of 5: 54.2 nsec per loop

$ python -m timeit "hash(b'')"
20000000 loops, best of 5: 12.6 nsec per loop

One concern with the builtin hash() is that it returns 32-bit hashes on 32-bit machines, but I guess 32-bit machines are rare these days, so it might not be a real concern? Another concern is that it's not deterministic across runs, which could be an issue if we want to support resume features. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me merge this as is, and we can discuss hash functions separately.

@daejunpark daejunpark merged commit 0ef9341 into main Dec 17, 2024
45 checks passed
@daejunpark daejunpark deleted the feat/snapshot branch December 17, 2024 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants