-
Notifications
You must be signed in to change notification settings - Fork 627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vm: store compiled contracts directly in the filesystem #10791
vm: store compiled contracts directly in the filesystem #10791
Conversation
Going through rocksdb involves compression, all the rocksdb logic etc. That can't be cheap. This is also a cache so none of the features really benefit the use-case. Storing compiled contracts as files also makes it much more straightforward to inspect them now if the need to do so arises. For example an estimation of the size distribution is one `ls` away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fairly straightforward. Even if we do not end up using it in production, I don't see a big harm in merging this in. Excited to see some benchmarking results.
runtime/near-vm-runner/src/runner.rs
Outdated
|
||
/// Cache for compiled contracts code in plain filesystem. | ||
impl CompiledContractCache for FilesystemCompiledContractCache { | ||
fn put(&self, key: &CryptoHash, value: CompiledContract) -> std::io::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a check for if the file already exist and then the function is effectively a NOP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RocksDB-based implementation says something about the fact that there is some non-determinism somewhere (though this was probably an outdated statement) and that intended behaviour is to overwrite in case we end up with different data. I didn't want to prod that particular bear nest this time around :)
I haven't pursued mmap and such yet as it requires non-trivial additional changes, and those can be made as part of a followup too.
Creating the cache requires us to know the right place to create it at. This has required me to thread the thing pretty much though the entire stack. But it also has resulted in some benefits -- for instance we won't be unnecessarily doing any directory related operations every time a view call is invoked cause the cache is created pretty early on and is reused. This is also a step in the right direction in the longer term where we probably want the contract runtime to stand on its own a little more and have the invoker of the transaction runtime to be responsible for integrating the components together.
This makes the cloning and lifetime of the cache directory much more straightforward to reason about.
I had to finish this first before going at the VMArtifact cache due to the fact that the VMArtifact cache might end up being managed by the "CompiledContractCache" (although most likely not) and I didn't want myself to be bogged down by not being able to make the choice. This PR should be ready to land in its current state. I was able to repurpose the previous tests that we had and that were using the RocksDB-based implementation to now utilize this new cache instead, so I didn't really need to spend too much time on that facet, although I would love many more tests for this functionality. I also kept the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving to unblock. High level, looks OK. Still a detailed review from @wacban will be good.
just sanity check |
Good question. I think we should be able to handle |
Partial writes are not visible to the cache -- the writes are "commited" using the Regular signals (including SIGKILL) will at worst leave some |
I debugged the integration test failures. Requires some adjustments to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
mostly just requests for code comments
you may want to consider adding some metrics and logs to easy future debugging
/// This cache however does not implement any clean-up policies. While it is possible to truncate | ||
/// a file that has been written to the cache before (`put` an empty buffer), the file will remain | ||
/// in place until an operator (or somebody else) removes files at their own discretion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just going to ask you about this. Why did you decide to not clean up the cache? Do you expect it to be growing slow enough to ever actually cause trouble? Do you know what would be the total size of cache today?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is again something that the previous rocksdb based implementation does as well. You can check the corresponding column in rocksdb for the current numbers. I have no clue how to do that myself.
The size of this directory should end up being in the same ballpark in the long term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's around 10GB. That's not motivating enough for me to try to optimize.
cc @pugachAG for an implementation of contract cache that we may want to reuse or build on top of for the stateless validation contract cache. This one is for compiled code but perhaps that's even better? |
Previously some of the tests implicitly isolated the cache between contracts by specifying distinct stores. With the new filesystem cache the contract caches must be distinct too in some of these tests.
@wacban It would be great to have a last pass check on the teach test env about contract caches commit. If the CI passes (I believe it should) this will be landable. Thank you for the thorough review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (looking at the test env commit)
} | ||
} | ||
|
||
impl<C: CompiledContractCache> CompiledContractCache for &C { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took me a while to notice that &
sign there.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #10791 +/- ##
==========================================
- Coverage 71.63% 71.63% -0.01%
==========================================
Files 761 761
Lines 152915 153067 +152
Branches 152915 153067 +152
==========================================
+ Hits 109546 109643 +97
- Misses 38399 38438 +39
- Partials 4970 4986 +16
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
The missing coverage is largely in the generic code that delegates to other methods... |
It's me again :) Would it make sense to remove that no longer used db column when we switch to the filesystem impl? Will there be any perf impact when a node switches to this impl? Would it make sense to warm it up or is that not needed? |
Formally the
There will be an impact that's roughly equivalent to the impact of upgrading neard to a version that contains changes compared to a previous version – the node will compile contracts as they are called for the first time since the upgrade moment. This sort of thing used to happen somewhat frequently in the past without causing notable issues, so in principle I do not believe we need to treat this change specially in any way. |
Going through rocksdb involves compression, all the rocksdb logic etc. That can't be cheap. This is also a cache so none of the features really benefit the use-case. Storing compiled contracts as files also makes it much more straightforward to inspect them now if the need to do so arises. For example an estimation of the size distribution is one `ls` away.
Going through rocksdb involves compression, all the rocksdb logic etc. That can't be cheap. This is also a cache so none of the features really benefit the use-case. Storing compiled contracts as files also makes it much more straightforward to inspect them now if the need to do so arises. For example an estimation of the size distribution is one `ls` away.
Going through rocksdb involves compression, all the rocksdb logic etc. That can't be cheap. This is also a cache so none of the features really benefit the use-case.
Storing compiled contracts as files also makes it much more straightforward to inspect them now if the need to do so arises. For example an estimation of the size distribution is one
ls
away.