Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rtt.fresh will not provide data abstraction #178

Closed
RossTate opened this issue Jan 14, 2021 · 17 comments
Closed

rtt.fresh will not provide data abstraction #178

RossTate opened this issue Jan 14, 2021 · 17 comments

Comments

@RossTate
Copy link
Contributor

Many programs want to be able share references without exposing all the fields contained in those references. Typically this is done by exporting the type of those references abstractly or "opaquely". But with rtt.canon, another module can guess the type of those references (say by looking at the module implementation) and downcast them, thereby gaining access to all their fields. Things like rtt.fresh and the data restriction were added to prevent this, but I've figured out how to dodge those measures.

Suppose you have a reference of some abstract imported type, i.e. ref $A. You want to get its contents and have reason to believe $A is t. Here is what you do:

(func $reveal (param $target (ref $A)) (result (ref t))
    (local.get $target)
    (rtt.canon (ref (struct (mut (ref $A)))))
    (struct.new_with_rtt)
    ;; now there's a (ref (struct (mut (ref $A)))) containing $target
    (rtt.canon (ref (struct (mut (ref t)))))
    (rtt.cast)
    ;; now there's a (ref (struct (mut (ref $A)))) containing $target
    ;; this will succeed if $A indeed represents t
    (struct.get 0)
    ;; $target is now on the stack with its contents revealed
)

(Note that you can do the reverse to forge references as well.)


What will it take to fix this so that WebAssembly can provide the sort of data abstraction that is standard in other VMs? Unfortunately the notion of private types that was added to the Type Imports proposal does not interact well with casting. That is, when you export a class hierarchy "C extends B extends A" from an OO module, you want others to be able to cast between these nominal types—you just don't want others to be able to cast them to their hidden structural types. As such, you can't simply wrap a struct with C's private type as that won't be castable to B or A's private types. So we'd have to go further and allow modules to define hierarchies of private types, as well as extend the hierarchies of other modules with additional private types.

This seems to suggest that WebAssembly needs a nominal static type system regardless of whether it has a structural static type system.

@rossberg
Copy link
Member

Yes, you are right! Thanks for the example. Though fixing this through nominal typing would only hack around the symptoms.

The deeper reason for this problem actually is one that has always bothered me about the current design, namely that rtt.canon as defined breaks parametricity of type imports. That is bad for many reasons, both semantically and implementation-wise, and should be avoided. This is one example for how it can have bad consequences.

The only way to avoid the parametricity breach is to make rtt.canon compositional and require explicit RTT operands for each subcomponent type -- or at least for the ones that are not statically transparent. That would prevent your example, because you'd not be able to construct the RTT for the struct without explicitly providing the RTT for $A, and presumably there would be only one (or even none) available.

@RossTate
Copy link
Contributor Author

I can employ the same technique using call_indirect, as discussed in WebAssembly/design#1343. Previously you objected to the suggestions I made to fix this problem, and you the opposed the Call Tags proposal that provides an abstraction-safe alternative to call_indirect. Are you lifting those objections now and making the necessary changes to the Type Imports proposal? If so, it'd be nice to fix call_indirect and externref as well so that externref would be virtualizable, since all that takes is a simple validation change for just call_indirect, and your previous objection to this fix was on the same basis that you seem to be retracting.

As for your fix to rtt.canon, it sounds like you're referencing the extension to rtt.canon that you provided in the Post-MVP. Are you saying that we should extend the MVP with type functions so that we can accommodate imported types?

Note that (assuming the above fixes are made), rtt.fresh has the following advantages over rtt.canon: provides data abstraction, runs in constant time, and does not require the rtt of the imported types it makes use of.

@RossTate
Copy link
Contributor Author

@rossberg Can you indicate if you are planning to make the necessary changes to the relevant proposals to avoid the need for nominal types here?

@rossberg
Copy link
Member

rossberg commented Feb 4, 2021

Somebody pointed out to me that there isn't necessarily an issue with this. The purpose of generative RTTs would be to enable compilers to piggy-back Wasm casts to implement certain language-level casts. It's not to achieve cross-language encapsulation. That would be asking for a full abstraction property, which is not something Wasm has ever supported in general -- certainly not when you're using linear memory, and it's not been a goal of the GC proposal to magically provide that either.

Safe encapsulation is a separate feature, and is currently proposed to be provided by private types as in the type import/export proposal. Of course, there is an argument to be made that both these use cases could potentially be served by a single feature, which is why I have held back on adding generative RTTs so far. However, that would require significant complications to private types. That seems unwise before we have some practical experience with them.

(I'm still concerned about the more general problem of lack of parametricity for type imports, though. We could restrict call_indirect to fix that, like I think you suggested at some point. But then we'd need RTTs to work around that restriction, which are currently nicely delimited to the GC part of the language.)

@RossTate
Copy link
Contributor Author

RossTate commented Feb 4, 2021

Thanks for responding. But the response seems to mischaracterize the problem, the common practices, and the alternative you have provided.

Runtime-cost-free data encapsulation is standard in (the memory-safe subsets of) typed languages and VMs. It is regularly used for security (keeping secrets and preventing forgery) and for preventing unwanted dependencies on implementation specifics (i.e. abstraction). It is also something that JavaScript does not make so easy/cheap, and so is a runtime-cost-free way to make WebAssembly a preferred target over JavaScript.

It is a severe mischaracterization to suggest it requires any "magic". More accurately, it is the highly non-standard "magic" features that you have insisted WebAssembly support that create the problem in the first place, features such as rtt.canon. These features make it possible to circumvent abstractions. So when you say "safe encapsulation is a separate feature", that is because you have gone out of your way to add features that make encapsulation not be safe by default.

As for "private types", that feature is also non-standard, as other languages/VMs have safe encapsulation by default. It is a patch over a problem you have created. (And, no, it is not at all like newtype or abstype, for reasons I have given in WebAssembly/design#1394 (comment).) In addition to incurring run-time overhead, it is not well fit for common applications of encapsulation, such as capabilities with subtypes or classes with private fields and inheritance.

So how would you like to proceed? The options raised so far seem to be:

  1. Add nominal types (with a subtyping hierarchy)
  2. Update rtt.canon and call_indirect to respect parametricity
  3. Add text to the Overview indicating that runtime-cost-free data encapsulation will not be supported, with suggestions for how to support features typically implemented with runtime-cost-free data encapsulation using "private types" instead

@conrad-watt
Copy link
Contributor

conrad-watt commented Feb 4, 2021

Runtime-cost-free data encapsulation is standard in (the memory-safe subsets of) typed languages and VMs.

If one is allowed to carve out a safe language subset (I assume this is something like the JVM without reflection), would it be equally legitimate to say that Wasm (edit: with rtt.fresh) does provide data abstraction for a subset of the language where rtt.canon is not used (at least in the way described by the OP)?

@RossTate
Copy link
Contributor Author

RossTate commented Feb 4, 2021

The parenthetical was carving out things that generally exit the language/VM through means that are unsafe and/or only permitted in trusted settings, i.e. backdoors in trusted settings. Examples are OCaml's Obj.magic, JNI for the JVM, and Java/C#'s reflection mechanisms (which have explicit means to configure specifically private-variable access). They are not really comparable to rtt.canon or call_indirect, as @rossberg has regularly argued that those should be core features of WebAssembly and central to its ecosystem, which makes it hard to argue that they are only backdoors for trusted settings.

@conrad-watt
Copy link
Contributor

conrad-watt commented Feb 4, 2021

In a hypothetical scheme where a nominally typed source language is implemented using rtt.fresh, I would consider abstraction-breaking uses of rtt.canon to be somewhat analogous to Obj.magic/reflection. In this scenario, one is taking compiled code and explicitly composing on a (hand-written?) module which breaks the data abstraction property which would otherwise hold.

As @rossberg said, there is still the question of whether rtt.fresh should be the mechanism through which we implement data abstraction. It may be that we pursue some richer post-MVP nominal/private types which are entirely separate. If we do end up going the rtt.fresh route, then there might be a (probably contentious) discussion about whether some SecurityManager style flag should allow rtt.canon to be forbidden in certain situations.

@tlively
Copy link
Member

tlively commented Feb 4, 2021

The purpose of generative RTTs would be to enable compilers to piggy-back Wasm casts to implement certain language-level casts. It's not to achieve cross-language encapsulation. That would be asking for a full abstraction property, which is not something Wasm has ever supported in general -- certainly not when you're using linear memory, and it's not been a goal of the GC proposal to magically provide that either.

Full abstraction might be overkill, but it has always been possible to ensure confidentiality and integrity of data in a WebAssembly module by choosing to only export a carefully designed interface. It would be a shame if the expressiveness of module interfaces designed to preserve confidentiality and integrity lagged behind the expressiveness of the full language, especially given how important these security properties are to the most ambitious visions of WebAssembly's future.

@RossTate
Copy link
Contributor Author

RossTate commented Feb 4, 2021

As an example, with the features that @rossberg has laid out, an OO language using separate compilation seems to be unable to prevent modules from monkey-patching the method implementations in v-tables/i-tables. Even with reflection and all security restrictions turned off, the JVM and CLI ensure that modules cannot change method implementations of classes they do not define.

The JS API prevents even untyped JS from accessing non-imported/exported memory and globals of module instances. It seems inconsistent to not prevent even typed wasm code from accessing non-imported/exported fields of GC references (where even the rtt is not imported/exported), especially when such prevention can be done without any run-time overhead.

@conrad-watt
Copy link
Contributor

conrad-watt commented Feb 4, 2021

@RossTate I would have thought that the relevant v-table fields would be declared immutable. Is there a reason this doesn't work?

EDIT: I suppose, depending on the implementation of interfaces, there may be an issue in that we currently lack immutable arrays?

@RossTate
Copy link
Contributor Author

RossTate commented Feb 5, 2021

With the initialization plans @rossberg laid out in #189 (comment), you cannot have immutable v-table fields (or i-table arrays) in the presence of separate compilation. In separate compilation, when you create a v-table for a new class, you first pass that v-table to the initializer of your superclass. Since that initializer can be in another module (or since your own class might be extended by another module), v-table initialization will need to be done by struct.seting v-table fields (that are defaulted to null), so the fields cannot be declared immutable. (More generally, only fields/methods of final/sealed classes will be able to be non-nullable and/or immutable.)

Addendum: The fact that these details matter is illustrative of how weak the encapsulation properties of the (Post-)MVP are. There are other reasons why v-tables will want to be mutable (but only by the module that created the v-table), such as on-demand loading (in which case the v-table will filled with stubs that get replaced after the dependent code is loaded) or JITing (wherein the v-table needs to be updated with the more optimized/specialized implementation).

@rossberg
Copy link
Member

rossberg commented Feb 9, 2021

@tlively:

Full abstraction might be overkill, but it has always been possible to ensure confidentiality and integrity of data in a WebAssembly module by choosing to only export a carefully designed interface.

But that is still the case. Exporting a non-private reference to an untrusted party is not safe, just like exporting your memory isn't. If you want to maintain confidentiality, don't do either.

To pass a reference to an untrusted party while maintaining confidentiality, you would use a private type.

@RossTate:

Runtime-cost-free data encapsulation is standard in (the memory-safe subsets of) typed languages and VMs. It is regularly used for security (keeping secrets and preventing forgery) and for preventing unwanted dependencies on implementation specifics (i.e. abstraction).

Care to give an example of a cost-free encapsulation mechanism in a mainstream VM? I'm not aware of any. You have to wrap the data into either an object or a closure, and both are a long shot from being free. And in the JVM, this isn't even safe, since it can be circumvented by reflection. The CIL has trust levels to control that, but it's a global-ish mechanism that doesn't provide per-object encapsulation properly.

Private types are equivalent to wrapping the data into an object, but without the loopholes.

@RossTate
Copy link
Contributor Author

RossTate commented Feb 9, 2021

Care to give an example of a cost-free encapsulation mechanism in a mainstream VM? I'm not aware of any.

The JS API for WebAssembly ensures that JS cannot access non-exported fields (e.g. globals, memories, tables) of wasm modules even though JS is untyped, has a direct reference to the instance object, and is the higher-privileged (e.g. can catch traps) embedding language for wasm.

OCaml has cost-free encapsulation except for the one feature that also lets you convert an arbitrary integer into an address.

SML and Haskell have cost-free encapsulation.

The CLI and the JVM have cost-free encapsulation with the appropriate security settings and/or declarations of attributes.

And in the JVM, this isn't even safe, since it can be circumvented by reflection.

I don't find pointing to weaknesses in other systems to be a particularly compelling argument, especially when those weaknesses exist solely to support other features that we do not provide (such as reflection-based meta-programming libraries). Furthermore, as mentioned above, even with reflection you cannot change the method implementations in v-tables in the JVM/CLI, but you can in the MVP.

So you're essentially justifying weaknesses of the MVP by pointing out bad properties of other systems, then throwing away the advantages that were why those bad properties were present in those systems to begin with, discarding the complementary threat-mitigating features for untrusted settings, and making the weaknesses of those bad properties even worse.

Private types are equivalent to wrapping the data into an object, but without the loopholes.

As discussed in the OP, this is not true. Private types do not respect subtyping.


To be clear, it is completely possible to design the MVP such that Java/C#/Kotlin compiled-to-wasm modules can share direct references to their objects with other wasm modules and even with JS without those other modules being able to access private fields or mutate v-tables without any run-time cost, in a way that respects subtyping, and in a way that supports separate compilation and class inheritance. The compiled-to-wasm modules can even control what is accessible/mutable through reflection mechanisms in the compiled-to-wasm language runtime.

All it takes to have this is to restrict the overpowered encapsulation-breaking features of the MVP so that that they are no longer encapsulation-breaking. We know from other systems that these restrictions are still sufficiently expressive. Can you explain why you are opposed to this?

@rossberg
Copy link
Member

Let's separate user-facing languages from VMs here.

For the latter, e.g. CLI and JVM, I have no idea what mechanism you are referring to. Objects certainly aren't free.

For the former, I'm not sure what your point is. Cost-free encapsulation in languages like SML, OCaml and Haskell relies on their use of a uniform representation. There are also no casts in these languages (ignoring unchecked Obj.magic and friends), so there is no tangible relation to RTTs.

If your concern is compilation of these languages to Wasm, then that can be mapped to GC types just fine. That does not require full abstraction, no more than compiling C to linear memory does.

If your concern is cross-language interop, then we are effectively talking about a form of FFI (from the perspective of a single language) at that point. That typically uses different mechanisms and representations, and private types serve that.

I don't find pointing to weaknesses in other systems to be a particularly compelling argument

You made a claim about systems like it, and this refutes it, so shrug?

Private types do not respect subtyping.

Yes, but a completely different use case. What cross-language(!), FFI-level uses of subtyping do you envision? I can imagine some, but nothing desperately needed in the MVP. It's not clear whether relying on subtyping would be a good idea in a language-agnostic interface anyway.

@titzer
Copy link
Contributor

titzer commented Feb 23, 2021

Sorry for being late to the discussion, but we seem to be going backwards. I gave a presentation some months back on Jawa, my prototype of running Java with separate compilation and late binding on top of WebAssembly. One of the clear findings of that line of research is that late binding of lowered code just doesn't work. I think this issue has veered off topic with mutable v-tables and monkey-patching across module boundaries. Even with data abstraction across modules with proper RTTs, this is still going to be a problem because you've fundamentally exposed implementation details that should not be exposed. It's inescapable that you need to defer lowering until link time.

@tlively
Copy link
Member

tlively commented Apr 4, 2022

I'm closing this issue for now because it does not seem actionable at this time. If we do discuss bringing back generative RTTs, we should be mindful of relevant past discussions such as this one, though.

@tlively tlively closed this as completed Apr 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants