Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[move] Loader V2 API integration for MoveVM #14075

Merged
merged 1 commit into from
Aug 3, 2024

Conversation

georgemitenkov
Copy link
Contributor

@georgemitenkov georgemitenkov commented Jul 22, 2024

Description

  1. Versioning Loader into V1 and V2: this allows easier integration so that we can switch a flag and get the new behaviour.
  2. New loader implementation draft:
  • LoaderV2 does not store modules or scripts, instead the corresponding ScriptStorage and ModuleStorage are passed as references to loader APIs. This allows us to cache things on read.
  • LoaderV2 still has type caches which keep track of depth formulas, etc. like the old loader: these do not change the behaviour with module upgrade, and it is fine to keep them IMO for simpler integration.

New data structures:

  1. LoaderV2 also has V: Verifier type param to make sure we can provide custom verifier when loading a module
  2. Loading a module calls to fetch_or_create_verified_module - this function actually loads the transitive closure of dependencies, and this is an implementation detail of code cache (not here).
  3. Because we do not want to leak data structures from Loader into code cache, the idea is to pass a function to fetch_or_create_verified_module that is just f: &dyn Fn(Arc<CompiledModule>) -> PartialVMResult<Module> and is passed by the loader. This way this function can capture struct name cache which maps struct names to identifiers, native functions, etc.

Alternative design

We can have a shared context which will be shared between loader & code cache: something like this:

pub struct LoaderContext<V: Verifier> {
    // Native functions can be also here, but type caches are kept in the loader.
    pub(crate) struct_name_index_map: StructNameIndexMap,
    phantom_data: PhantomData<V>,
}

impl<V: Verifier> LoaderContext<V> {
    pub fn build_module(
        &self,
        cm: Arc<CompiledModule>,
        module_storage: &impl ModuleStorage,
    ) -> PartialVMResult<Module> {
        V::verify_module(cm.as_ref())?;
        let imm_dependencies = cm
            .immediate_dependencies_iter()
            .map(|(addr, name)| module_storage.fetch_verified_module(addr, name))
            .collect::<PartialVMResult<Vec<_>>>()?;
        V::verify_module_with_dependencies(
            cm.as_ref(),
            imm_dependencies.iter().map(|m| m.module()),
        )?;
        // Must be a private constructor, so that one can create module only via the context.
        Module::new_v2(module_storage, &self.struct_name_index_map, cm)
    }
}

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Other (specify)

How Has This Been Tested?

Key Areas to Review

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Jul 22, 2024

⏱️ 1h 59m total CI duration on this PR
Job Cumulative Duration Recent Runs
rust-move-tests 15m 🟩
rust-move-tests 15m 🟩
rust-move-unit-coverage 13m 🟩
rust-move-unit-coverage 12m 🟩
rust-move-tests 11m 🟩
rust-move-tests 10m 🟩
rust-move-unit-coverage 8m 🟩
rust-move-unit-coverage 7m 🟩
rust-cargo-deny 7m 🟩🟩🟩🟩
general-lints 7m 🟩🟩🟩🟩
check-dynamic-deps 5m 🟩🟩🟩🟩
file_change_determinator 3m 🟩🟩🟩🟩🟩 (+11 more)
semgrep/ci 2m 🟩🟩🟩🟩🟩
permission-check 50s 🟩🟩🟩🟩🟩 (+11 more)
file_change_determinator 47s 🟩🟩🟩🟩
permission-check 43s 🟩🟩🟩🟩🟩 (+11 more)
permission-check 42s 🟩🟩🟩🟩🟩 (+11 more)
permission-check 39s 🟩🟩🟩🟩🟩 (+11 more)
rust-move-unit-coverage 1s

settingsfeedbackdocs ⋅ learn more about trunk.io

Copy link

codecov bot commented Jul 22, 2024

Codecov Report

Attention: Patch coverage is 50.89216% with 633 lines in your changes missing coverage. Please review.

Project coverage is 59.0%. Comparing base (a13c2f0) to head (dc00eee).

Files Patch % Lines
...d_party/move/move-vm/runtime/src/storage/loader.rs 0.0% 244 Missing ⚠️
third_party/move/move-vm/runtime/src/loader/mod.rs 65.2% 164 Missing ⚠️
...rd_party/move/move-vm/runtime/src/storage/dummy.rs 0.0% 69 Missing ⚠️
aptos-move/aptos-vm/src/aptos_vm.rs 0.0% 42 Missing ⚠️
third_party/move/move-vm/runtime/src/session.rs 57.1% 33 Missing ⚠️
third_party/move/move-vm/runtime/src/data_cache.rs 54.8% 14 Missing ⚠️
...move-vm/runtime/src/storage/struct_type_storage.rs 40.0% 12 Missing ⚠️
...tos-move/aptos-vm/src/verifier/event_validation.rs 0.0% 10 Missing ⚠️
...ptos-move/aptos-vm/src/verifier/resource_groups.rs 0.0% 9 Missing ⚠️
...party/move/move-vm/runtime/src/native_functions.rs 69.2% 8 Missing ⚠️
... and 10 more
Additional details and impacted files
@@              Coverage Diff               @@
##           george/main   #14075     +/-   ##
==============================================
- Coverage         59.1%    59.0%   -0.1%     
==============================================
  Files              828      833      +5     
  Lines           201915   202672    +757     
==============================================
+ Hits            119416   119683    +267     
- Misses           82499    82989    +490     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@gelash gelash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to add script_hash to CompiledScript or Script? it could also be smt we build e.g. in constructors.

third_party/move/move-vm/runtime/src/storage/dummy.rs Outdated Show resolved Hide resolved
third_party/move/move-vm/runtime/src/storage/dummy.rs Outdated Show resolved Hide resolved
third_party/move/move-vm/runtime/src/storage/loader.rs Outdated Show resolved Hide resolved
third_party/move/move-vm/runtime/src/storage/dummy.rs Outdated Show resolved Hide resolved

let (data, bytes_loaded) = match loader {
Loader::V1(_) => {
let maybe_module = module_store.module_at(&ty_tag.module_id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matter of style, but as a fancy fan got to mention we can directly compute the metadata by chaining .map_or_else(|| &[], |m|.module().metadata());
on one hand maybe_module isn't used elsewhere, on the other hand it makes reading a bit clearer. Similarly match might be more verbose - so up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can use a map because then the lifetimes become weird... let's see if I can refactor the code so that it works, mapping or else is nicer indeed.

// If we need to process aggregator lifting, we pass type layout to remote.
// Remote, in turn ensures that all aggregator values are lifted if the resolved
// resource comes from storage.
self.remote.get_resource_bytes_with_metadata_and_layout(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we can move these calls in V1 and V2 outside? we just need to get different metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, not really, we have different lifetimes here because the V2 returns a reference which lives as long as module storage lives, and V1 lives only as long as the optional module lives. And it gets this ugly. WIll try to refactor the code a bit to make it look nicer

PartialVMError::new(StatusCode::FUNCTION_RESOLUTION_FAILURE)
.with_message(format!("Module {} doesn't exist", module_name))
})?;
// Note(George): V2 loads function directly to avoid this case completely!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is whatever the below was precautioning from no longer relevant? then maybe we can add one sentence to sketch that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a sentence, but yes: V1 uses cache which maybe empty, but then there is this assumption that module must be in module cache when we load a function. This seems pretty fragile, and in V2 we always fetch a module from module storage and cache on read.

@@ -645,6 +667,15 @@ impl Interpreter {
.map_err(|err| err.to_partial())
},
NativeResult::LoadModule { module_name } => {
if let Loader::V2(_) = resolver.loader() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhm, why do we need a new check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below we do check_dependencies_and_charge_gas. Current implementation uses annoying allow_module_failure only for the first set of ids because these dependencies indeed may not exist (e.g., non existent entry function, etc.) but for other dependencies down the closure they always exist. So we optionally check if allow LINKER_ERROR or it is an invariant violation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But actually yes, in this implementation I do not think we need this here, good point

@georgemitenkov georgemitenkov marked this pull request as ready for review July 26, 2024 15:06
@georgemitenkov georgemitenkov changed the title WIP: almost full loader V2 API integration for MoveVM [move] Loader V2 API integration for MoveVM Jul 27, 2024
Copy link
Contributor

@gelash gelash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this George, the fact that you keep improving everything around your changes is amazing

@@ -171,7 +172,7 @@ impl<'r> TransactionDataCache<'r> {
Ok(change_set)
}

pub(crate) fn num_mutated_accounts(&self, sender: &AccountAddress) -> u64 {
pub(crate) fn num_mutated_resources(&self, sender: &AccountAddress) -> u64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it resources? so it matches the account and counts mutated entries associated with the account, suppose that makes sense and thanks for fixing! however,

  • let's also change inner variable names?
  • total_mutated accounts is initialized w. 1 because there is an account resource? but that seems ugly: what if we call it with some account that wasn't even relevant though, it would just return 1? 'sender' name maybe helps a bit here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because it uses data_map in its implementation and not modules. This nicely aligns with the fact that we will not store modules inside anymore👍

To be honest, I am not sure how useful this API is, it is only used for tests, but I know that not everyone is happy with removing public interfaces no matter what😄 But this seems so ugly... I agree, why we assume sender is mutated? Mu guess is that this must be the adapter's responsibility, where we have senders.

Example of a test:

// The resource was published to "account1" and the sender's account
// (TEST_ADDR) is assumed to be mutated as well (e.g., in a subsequent
// transaction epilogue).
assert_eq!(sess.num_mutated_resources(&TEST_ADDR), 2);

But this is just wrong, test assumes that something is mutated....

Copy link
Contributor Author

@georgemitenkov georgemitenkov Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I see two ways:

  • Remove this assumption on sender. Move doesn't care about this and epilogue, this is clearly off.
  • Remove this API completely, or better make it #[cfg(feature = "testing")] or something like that.

@gelash what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's see if we are missing something, o.w. I'd purge it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @runtian-zhou @vgao1996 who might have some context about this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aptos-move/framework/src/module_metadata.rs Show resolved Hide resolved
third_party/move/move-vm/runtime/src/session.rs Outdated Show resolved Hide resolved
.map_err(|err| self.attach_state_if_invariant_violation(err, &current_frame))?;
// Select which resolver to use: if we are executing in the context of the entrypoint
// function, we use the one provided by the caller.
let resolver = if self.call_stack.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this new logic? previously it worked through Script / Module distinction, right? let's be careful here, but it does seem we were creating resolvers every time for scripts and now we re-use? can call stack be cleared e.g. on abort, and then have unintended consequences? just asking.
Maybe we can encapsulate the logic somewhere (and add any other safeguards/checks if needed)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. I rewrote that ugly APIs by storing LoadedFunction everywhere, and providing it with scope. @runtian-zhou I think this looks cleaner, and V2 loader API fit perfectly. WDYT? See the last commit in this PR (Note: I did not polish certain types, etc., but added this code to give an idea).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: this change is refactored into #14185

third_party/move/move-vm/runtime/src/loader/mod.rs Outdated Show resolved Hide resolved
third_party/move/move-vm/runtime/src/loader/mod.rs Outdated Show resolved Hide resolved
) -> PartialVMResult<Self> {
let module_id = function
.module_id()
.ok_or_else(|| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could move this inside module_id call, with error saying "Function {} must have a module owner", i.e. not always, but when we call, it should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, yes, but scripts still do not have it is not really an error, there a few debug use cases where we optionally print things. My take is that the function should not be owned, but LoadedFunction should be because this is the one that has full context, and can actually carry arced module or script. @runtian-zhou, what do you think?

I have tried that approach, and it requires changing types from Arc to LoadedFunction throughout, and I was not convinced this is better than just special casing for scripts.

@georgemitenkov georgemitenkov force-pushed the george/loader-v2 branch 3 times, most recently from f80b102 to 4b3f255 Compare July 30, 2024 14:14
Copy link
Contributor

@runtian-zhou runtian-zhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My general take is we can don't need to version the code just for the sake of switching things over. The new loader should be functionally equivalent so if keeping two versions at the same time is making the code harder to read and maintain we don't have to keep both logics. WDYT?

@@ -60,6 +62,7 @@ impl<'r, 'l> Session<'r, 'l> {
args: Vec<impl Borrow<[u8]>>,
gas_meter: &mut impl GasMeter,
traversal_context: &mut TraversalContext,
module_storage: &impl ModuleStorage,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to move the module_storage into Session? That way we could avoid changing the api. Also from the api standpoint I think it also makes sense, as we shouldn't expect modules loaded within a session to be altered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, and this is something I do not want to do on purpose - mostly because of init-module and respawned sessions. If we capture it in Move session, then how do you support init_module? It is not possible unless you respawn session or stash published modules temporarily inside the session (which exactly lead to existing problems).

The APIs can made nice by introducing a "context" struct which you pass everywhere, but it contains the gas, the module storage, etc, something similar to Resolver

Copy link
Contributor

@runtian-zhou runtian-zhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would be hesitant to just land the PR as is because we are using dummy storage everywhere, meaning this V2 logic isn't actually being used and tested for now right? Or did I miss anything?

@georgemitenkov georgemitenkov force-pushed the george/loader-v2 branch 3 times, most recently from 7a9cbfa to 0c56eff Compare July 31, 2024 12:06
@georgemitenkov georgemitenkov mentioned this pull request Oct 23, 2024
22 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants