Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any formal execution definition of exception handling? #87

Closed
sys5867 opened this issue Aug 1, 2019 · 70 comments
Closed

Any formal execution definition of exception handling? #87

sys5867 opened this issue Aug 1, 2019 · 70 comments

Comments

@sys5867
Copy link

sys5867 commented Aug 1, 2019

Hello.

I'm interested in formal execution definition of WASM with exception handling proposal included.

What I now think is that the try catch end block semantics with throw instruction can be described in reduction rules, similar to PLDI'17 MVP spec paper.

Also, I heard from @aheejin that @rossberg has done some work on the formal spec of the proposal.

Are there in-progress works related to formal definition of exception handling?

@rossberg
Copy link
Member

rossberg commented Aug 1, 2019

There is nothing written up in publishable form yet. So let me just dump the rules as an extension to the PLDI paper (or rather, the somewhat updated CACM version) in ASCII art here. Hope that helps answering your question. :)

Abstract Syntax

Value Types

t ::= ... | exnref

Exception Types

et ::= t*

Instructions

e ::= ... | throw x | rethrow | try ft e* catch e* end | br_on_exn l x

Exceptions

exn ::= ex* exception et | ex* exception et im

Modules

mod ::= module ... exn*

Typing

Contexts

C ::= {..., exn et*}

Rules

C_exn(x) = t*
----------------------------
C |- throw x : t1* t* -> t2*

--------------------------------
C |- rethrow : t1* exnref -> t2*

ft = t1* -> t2*
C, label t2* |- e1* : t1* -> t2*
C, label t2* |- e2* : exnref -> t2*
-----------------------------------
C |- try ft e1* catch e2* end : ft

C_label(l) = C_exn(x)
-------------------------------------
C |- br_on_exn l x : exnref -> exnref

Reduction

Stores

S ::= {..., exn ei*}

Exception Instances

ei ::= {type et}

Module Instances

m ::= {..., exn a*}

Values

v ::= ... | exn a v*

Administrative Instructions

e ::= ... | throw a | catch_n{e*} e* end | exn a v*

Throw Contexts

T ::= v* [_] e* | label_n{e*} T end | catch_n{e*} T end | frame_n{F} T end

Rules

F; throw x  -->  F; throw a
  (iff F_exn(x) = a)

v^n (try ft e1* catch e2* end)  -->  catch_m{e2*} (label_m{} v^n e1* end) end)
  (iff ft = t1^n -> t2^m)

catch_m{e*} v^m end  -->  v^m

S; F; catch_m{e*} T[v^n (throw a)] end  -->  S; F; label_m{} (exn a v^n) e* end
  (iff S_exn(a) = {typ t^n})

(exn a v*) rethrow  -->  v* (throw a)

ref.null rethrow  -->  trap

F; (exn a v*) (br_on_exn l x)  -->  F; v* (br l)
  (iff F_exn(x) = a)

F; (exn a v*) (br_on_exn l x)  -->  F; (exn a v*)
  (iff F_exn(x) =/= a)

F; ref.null (br_on_exn l x)  -->  F; trap

@sys5867
Copy link
Author

sys5867 commented Aug 16, 2019

Thank you for the response!
I have additional questions about the semantics provided.

In the proposal, br_on_exn pops the exnref values from the stack and instead pushes the exception's argument values on top of the stack, if given exception index matches. The typing rule for br_on_exn then should assume exnref value on the top, and result in t* values afterwards. In other words, the typing rule should be,

C_label(l) = C_exn(x) = t*
-------------------------------------------
C |- br_on_exn l x : exnref -> t*

Also in the reduction rule, exception values are represented as exn a v*. Reduction rules for rethrow and br_on_exn assumes that exception arguments v* resides in operand stack. In implementation, where exception argument values are actually stored? From my understanding, only exnref value stays at the stack and exception arguments are stored in linear memory space, where the exnref value points to. Is this correct interpretation, or am I missing something?

@rossberg
Copy link
Member

No, the typing rule for br_on_exn is correct as given. Keep in mind that the values are only pushed onto the stack if the branch is taken, while the typing rule describes the stack manipulation of linear execution.

Exnref is an abstract (reference) type. It is an implementation detail how and where its argument values are stored. It is separate from any linear memory.

@rossberg
Copy link
Member

@sys5867, sorry, as was just pointed out to me by @ioannad, there indeed was a bug in the typing rules for br_on_exn: the t* in the conclusion should not be there, the overall type should just be exnref -> exnref. Fixed now.

aheejin added a commit that referenced this issue Mar 5, 2020
This PR adds the dependency to multi-value to the exception handling proposal text and to the README. 

I wrote an explanation of this dependency on the proposal text, but it's easier to see this once the verification and execution steps of `br_on_exn` and of `try` blocks are written out, as done [here](#87 (comment)) by @rossberg :

Validation:

```
ft = t1* -> t2*
C, label t2* |- e1* : t1* -> t2*
C, label t2* |- e2* : exnref -> t2*
-----------------------------------
C |- try ft e1* catch e2* end : ft

C_label(l) = C_exn(x) = t*
-------------------------------------
C |- br_on_exn l x : exnref -> exnref
```

Execution: 

```
v^n (try ft e1* catch e2* end)  -->  catch_m{e2*} (label_m{} v^n e1* end) end)
  (iff ft = t1^n -> t2^m)
S; F; catch_m{e*} T[v^n (throw a)] end  -->  S; F; label_m{} (exn a v^n) e* end
  (iff S_exn(a) = {typ t^n})

F; (exn a v*) (br_on_exn l x)  -->  F; v* (br l)
  (iff F_exn(x) = a)
```

Concerning the functionality of `try`-`catch` blocks, note especially the passing of `v^n` values into a `label_m{}`.
Concerning the functionality of `br_on_exn`, note especially the execution step resulting in a `br` instruction.
ioannad added a commit to ioannad/exception-handling that referenced this issue May 30, 2020
Following the informal formal spec by Andreas Rossberg :
WebAssembly#87 (comment)
and the discussions in the issues.
@ioannad
Copy link
Collaborator

ioannad commented Jun 4, 2020

@rossberg , with the addition of throw contexts, invoke might end up in a state where it neither succeeded with values nor has trapped, but is some throw context with a throw waiting for its catching try block. The simplest example is when invoking the function wasm_throw defined below:

(module  
   (type (func))
   (exception (export "exn") (type 0))
   (func (export "wasm_throw") (type 0)
      throw 0))

Invoking wasm_throw gets "stuck" in the state store; frame_1{F'} label_1 {} (throw tag) end end, where tag is the event address of exn, and F' the frame with the above module's instance.

I suppose this is the case meant by "the embedder defines how to handle the uncaught exception". But in this case it is not true any more that "Reduction terminates when a thread’s instruction sequence has been reduced to a result, that is, either a sequence of values or to a trap." (e.g., in the definition of evaluation contexts).

Should some other form of "result" (which is not a sequence of values or a trap) be defined to cover such cases of unresolved throw contexts?

@rossberg
Copy link
Member

rossberg commented Jun 4, 2020

@ioannad, yes, correct: with exceptions we will naturally need to extend the notion of result to include throw a. And the rules for reducing evaluation contexts will also need a couple of extra rules to get there.

Basically, it all needs to mirror what we have for traps. Except that there also is the try rule to intercept a throw.

ioannad pushed a commit to ioannad/exception-handling that referenced this issue Jun 6, 2020
… (WebAssembly#87)

Change the test generators to use `ref.func` and remove `passive`.

At some point we'll want to remove the generators, but for let's try to
maintain them.
ioannad pushed a commit to ioannad/exception-handling that referenced this issue Jun 6, 2020
Per the vote on WebAssembly#69, this PR removes subtyping from the proposal. List of changes:

* Syntax:
  - remove `nullref` type
  - rename `anyref` type to `externref`
  - extend `ref.null` and `ref.is_null` instructions with new immediate of the form `func` or `extern` (this will later have to generalise to a `constype` per the [typed references proposal](https://github.com/WebAssembly/function-references))
* Typing rules:
  - `ref.null`, `ref.is_null`: determine reference type based on new immediate
  - `select`, `call_indirect`, `table.copy`, `table.init`: drop subtyping
  - `br_table`: revert to rule requiring same label types
  - `elem` segment: drop subtyping
  - `global` import: drop subtyping (link time)
* Remove subtyping rules and bottom type.
* Revert typing algorithm (interpreter and spec appendix).
* JS API:
  - remove `"nullref"`
  - rename `"anyref"` to `"externref"`
* Scripts:
  - rename `ref` result to `ref.extern`
  - rename `ref.host` value to `ref.extern`
  - drop subtyping from invocation type check
* JS translation:
  - extend harness with separate eq functions for each ref type
* Adjust tests:
  - apply syntax changes
  - remove tests for subtyping
  - change tests exercising subtyping in other ways
ioannad pushed a commit to ioannad/exception-handling that referenced this issue Jun 6, 2020
This PR adds the dependency to multi-value to the exception handling proposal text and to the README.

I wrote an explanation of this dependency on the proposal text, but it's easier to see this once the verification and execution steps of `br_on_exn` and of `try` blocks are written out, as done [here](WebAssembly#87 (comment)) by @rossberg :

Validation:

```
ft = t1* -> t2*
C, label t2* |- e1* : t1* -> t2*
C, label t2* |- e2* : exnref -> t2*
-----------------------------------
C |- try ft e1* catch e2* end : ft

C_label(l) = C_exn(x) = t*
-------------------------------------
C |- br_on_exn l x : exnref -> exnref
```

Execution:

```
v^n (try ft e1* catch e2* end)  -->  catch_m{e2*} (label_m{} v^n e1* end) end)
  (iff ft = t1^n -> t2^m)
S; F; catch_m{e*} T[v^n (throw a)] end  -->  S; F; label_m{} (exn a v^n) e* end
  (iff S_exn(a) = {typ t^n})

F; (exn a v*) (br_on_exn l x)  -->  F; v* (br l)
  (iff F_exn(x) = a)
```

Concerning the functionality of `try`-`catch` blocks, note especially the passing of `v^n` values into a `label_m{}`.
Concerning the functionality of `br_on_exn`, note especially the execution step resulting in a `br` instruction.
ioannad added a commit to ioannad/exception-handling that referenced this issue Jun 17, 2020
…sal.

This adds the [informal formal core spec](WebAssembly#87 (comment))
laid out by Andreas Rossberg, and the core spec outlined in the
[proposal overview](https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md).

I have aimed for this to be as complete as possible for the above
"informal formal spec" and proposal overview, meaning I added prose
for all the relevant parts in syntax, validation, execution, binary
format, text format, and the appendix, as well as an example for
throw contexts.
ioannad added a commit to ioannad/exception-handling that referenced this issue Jun 17, 2020
This adds:

- the informal formal core spec laid out by Andreas Rossberg: WebAssembly#87 (comment) and
- the core spec outlined in the proposal overview: https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md

I have aimed for this to be as complete as possible for the above
"informal formal spec" and proposal overview, meaning I added prose
for all the relevant parts in syntax, validation, execution, binary
format, text format, and the appendix, as well as an example for
throw contexts.

I also updated the README.md to:

- remove mention to multi-value which is now part of the main spec, and
- add a build status icon for the exception-handling modified spec.
ioannad added a commit to ioannad/exception-handling that referenced this issue Jun 17, 2020
This adds:

- the informal formal core spec laid out by Andreas Rossberg: WebAssembly#87 (comment) and
- the core spec outlined in the proposal overview: https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md

I have aimed for this to be as complete as possible for the above
"informal formal spec" and proposal overview, meaning I added prose
for all the relevant parts in syntax, validation, execution, binary
format, text format, and the appendix, as well as an example for
throw contexts.

I also updated the README.md to:

- remove mention to multi-value which is now part of the main spec, and
- add a build status icon for the exception-handling modified spec.
Ms2ger pushed a commit that referenced this issue Jul 28, 2020
Included in this PR:

- Detailed core formal spec additions aiming to fully implement:
  + the "informal formal" core spec laid out by Andreas @rossberg here: #87 (comment) and
  + the core spec outlined in the [proposal overview](https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md), while removing the mention of "events" everywhere but in the binary format ,
  + prose for the above in: syntax, validation, execution, binary format, text format, and the appendix, as well as an example for throw contexts.
- Travis-ci build status icon to `README.md`. The contents of this PR get built and deployed on my fork's github page successfully, using [the howto instructions](https://github.com/WebAssembly/proposals/blob/master/howto.md#setting-up-travis-and-github-page).

Not included in this PR:

- Neither the new values representing unresolved throw contexts, nor the additional rules for the execution contexts, which we discussed in #87, are added here. I plan to add these separately.
- No JS API/Web API changes.

Side comments:

- In [document/core/text/types.rst](https://ioannad.github.io/exception-handling/core/text/types.html#text-refedtype): I'm not sure whether `refedtype ::= ... | event ⇒ exnref` should have `event` there or something else (perhaps `exception`?) However, since currently the only events are exceptions, this is probably not really an issue.
- The "informal formal spec" defines the administrative instruction which represents an exception package as `exnref`, i.e.,  the same name as the type. I called this administrative instruction `ref.exn` instead for two reasons;  to match the style of `ref.null` and `ref.extern`, and to avoid ambiguity in the discussions.
- I removed multi-value mentions from `README.md` because multi-value is now part of the main spec. For the same reason I didn't add "multi-value" to the list of included proposals at the top of the core spec landing page.
@ioannad
Copy link
Collaborator

ioannad commented Oct 1, 2020

EDIT: A reworked version of this formal spec overview is now added as a file with PR #143 .

I leave the text below for reference, but please note this spec below is not correct.


3rd proposal formal spec

This is an initial attempt to clarify the additions to formal spec in the new EH proposal (as of September 18, 2020).

I wrote this "informal formal spec" roughly in the style of the previous informal formal spec above.

The abstract syntax was given by @aheejin. This is my understanding of the typing and reduction steps for the new (old) instructions, mainly based on the the 1st proposal and several discussions in issues.

In this I assume rethrow has no immediate, as it is the current stand. This could be easily modified, whichever way we decide to go with rethrow.

I have a couple of questions inline below, but I opened separate issues for those, to keep this issue about fixing the formal spec.

Abstract Syntax

Exception Types

exntype ::= [t*] -> []

Instructions

instr ::= ... | throw x | rethrow
        | try bt instr* (catch exnidx instr*)+ end
        | try bt instr* (catch exnidx instr*)* catch_all instr* end
        | try bt instr* catch_br l
        | try bt instr* unwind instr* end

Exceptions (definitions)

exn ::= export* exception exntype  | export* exception exntype import

Modules

mod ::= module ... exn*

Typing

To verify that a try...catch_br l instruction refers to a label introduced by a try-catch or try-catch_br block (I'll call this a try-catch label), this introduces a type attribute to labels in the validation context, which attribute is set to try_catch when the label is a try-catch label and empty otherwise. The original notation label [t*] will then be a shortcut for label {result [t*], type ε}.
Labels introduced by try-unwind blocks do not get marked with the try_catch attribute.

To validate rethrow which can only appear inside a catch block, introduce a new list of currently caught exceptions, denoted by caught. I think of caught as a mark that we are inside a catch block.

Contexts

C ::= {..., (exn exntype)*, caught*}

Rules

C_exn(x) = [t*] -> []
--------------------------------
C |- throw x : [t1* t*] -> [t2*]
C_caught =/= ε
--------------------------------
C |- rethrow : [t1*] -> [t2*]
bt = [t1*] -> [t2*]
C, label {result [t2*], type try_catch} |- instr* : [t1*] -> [t2*]
(C, caught, label [t2*] |- instr_i* : [] -> [t2*])^n
(C, caught, label [t2*] |- instr'* : [] -> [t2*])?
------------------------------------------------------------------
try bt instr* (catch x_i instr_i*)^n (catch_all instr'*)? end : bt
bt = [t1*] -> [t2*]
C, label {result [t2*], type try_catch} |- instr* : [t1*] -> [t2*]
C_label(l).type = try
------------------------------------------------------------------
try bt instr* catch_br l : bt
bt = [t1*] -> [t2*]
C, label [t2*] |- instr_1* : [t1*] -> [t2*]
C, label [] |- instr_2* : [] -> []
-------------------------------------------
try bt instr_1* unwind instr_2* : bt

The last rule above could be modified to allow the unwind block to return values, for example a default value that should be returned when the instructions of the try block throw an exception. This seems unneccessary because unwind probably just concerns side effects (open/close a file, alter a mutable global, etc). (Issue #129)

Reduction

In order to describe the interaction of try...catch_br l with try...unwind...end, an administrative instruction caught_m is added, to help identify the destination of a throw (when/if found) before performing the stack unwinding. So caught_m marks a suspended state, where a throw destination is found, and the unwinding is about to happen, but as we unwind we must make sure we perform any cleanup forms when exiting administrative unwind frames.

To make sure we don't execute any instructions that appear after the throw, a further administrative instruction throw_point is added alongside with caught_m.

These new administrative instructions caught_m and throw_point are not necessary if catch_br l does not trigger any unwind-cleanup code between itself and its destination label l. (Issue #130)

Moreover, apart from throw contexts I added "protected contexts", to talk about the case when an exception is thrown and caught outside a try ... unwind instruction surrounding the throw.

Throw contexts now include catch_br{l} T end to distinguish protected contexts by their use of unwind.

Stores

S ::= {..., exn exninst*}

Exception Instances

exninst ::= {type exntype}

Module Instances

m ::= {..., (exn a)*}

Administrative Instructions

instr ::= ... | throw a | catch_n{a instr*}*{instr*} instr* end
        | catch_br{l} instr* end | unwind{instr*} instr* end
        | caught_m{a val^n}{instr*} instr* end | throw_point

Protected Contexts

P ::= val* [_] instr* | label_n{instr*} P end | unwind{instr*} P end |
    | catch_br{l} P end | frame_n{F} P end

Throw Contexts

T ::= ... | catch_br{l} T end

Rules

F; throw x  -->  F; throw a  (iff F_exn(x) = a)

F; caught_m{a val^n}{instr_1* rethrow instr_2*]} T[throw_point] end
  --> F; instr_1* val^n (throw a)


val^n (try bt instr* (catch x_i instr_i*)* (catch_all instr'*)^k end)
  -->  catch_m{a_i instr_i*}*{instr''*} (label_m{} val^n instr* end) end
  (iff bt = [t1^n] -> [t2^m] and (if k=1 and instr'*=ε then instr''* = nop)
  and (if k=1 and inst'*=/=ε then instr''* = instr'*) and (if k=0 then instr''*=ε).

S; F; catch_m{a_i instr_i*}*{instr'*} P[val^n (throw a)] end
  -->  S; F; caught_m{a val^n}{val^n instr_i*} P[throw_point] end
  (iff S_exn(a) = {type [t^n]->[]} and a_i = a)

S; F; catch_m{a_i instr_i*}*{instr*} P[val^n (throw a)] end
  -->  S; F; caught_m{a val^n}{instr*} P[throw_point] end
  (iff S_exn(a) = {type [t^n]->[]} and instr =/= ε and for every i, a_i =/= a)


val^n (try bt instr* catch_br) --> catch_br{l} (label_m{} val^n instr* end) end
  (iff bt = [t^n] -> [t^m])

catch_m{a_i instr_i*}*{instr*} (label_m{} B^l[ catch_br{l} (P[val^n (throw a)]) end] end) end
  --> caught_m{a val^n}{instr'*} (label_m{} B^l[ P[throw_point] ] end) end


val^n (try bt instr* unwind instr'* end) --> unwind{instr'*} (label_m {} val^n instr* end) end
  (iff bt = [t^n] -> [t^m])


F; caught_m{a val^n}{instr*} ( P[unwind{instr'*} T[throw_point] end]) end
  --> caught_m{a val^n}{instr*} P[instr'* throw_point] end

F; caught_m{a val^n}{instr*} T[throw_point] end --> label_m{} instr* end

F; caught_m{a val^n}{instr*} val^m end  -->  val^m

I would be happy if anyone could point out things that are wrong or that I misunderstood. (cc. @aheejin)

@RossTate
Copy link
Contributor

RossTate commented Oct 2, 2020

I can't tell what the semantics of rethrow is when it occurs inside a try inside a catch. Based on the implications of the discussion in #126, try (throw $exn) catch $exn (try rethrow catch $exn instr* end) should reduce to throw $exn rather than instr*. That is, the rethrown exception is rethrown from the place where it was caught rather than from the location of the rethrow instruction. This semantics would also address forwards-compatibility concerns.

@rossberg
Copy link
Member

rossberg commented Oct 5, 2020

Good stuff. It's great to have a semantics written down and see the open questions crystalise more clearly.

It is unfortunate that we need ad-hoc flags and markers in the typing rules to deal with the lack of composability in this version of the proposal, but I see no way around it. The typing rules seem right. As for the type of the unwind body, I think in principle we could allow a polymorphic result type, since it essentially ends in a branch, but that seems useless and would just complicate the operational semantics.

I'm a bit more confused about the operational semantics. Can you explain the role of the new P contexts and their distinction from T? I am not sure why catch_br appears in both -- or, in fact, in either of them. For example, the reduction rules for throw would seem to allow ignoring intermediate catch_br, such that

(try $outer
  (try $inner
    (try
      (throw $e)
      (catch_br $outer)
    )
    (catch ...)  (1)
  )
  (catch ...)  (2)
)

would go to (1), not (2) as it should.

F; caught_m{a val^n}{instr_1* rethrow instr_2*]} T[throw_point] end
  --> F; instr_1* val^n (throw a)

Hm, this looks odd. I believe it should be something like

F; caught_m{a val^n}{instr*} T[rethrow] end
  --> F; caught_m{a val^n}{instr*} T[val^n (throw a)] end
val^n (try bt instr* (catch x_i instr_i*)* (catch_all instr'*)^k end)
  -->  catch_m{a_i instr_i*}*{instr''*} (label_m{} val^n instr* end) end
  (iff bt = [t1^n] -> [t2^m] and (if k=1 and instr'*=ε then instr''* = nop)
  and (if k=1 and inst'*=/=ε then instr''* = instr'*) and (if k=0 then instr''*=ε).

Can you explain the need for the complex side condition on this one? I would have expected something like

 val^n (try bt instr* (catch x_i instr_i*)* (catch_all instr'*)^? end)
   -->  catch_m{a_i instr_i*}*{instr'*}^? (label_m{} val^n instr* end) end
   (iff bt = [t1^n] -> [t2^m])
catch_m{a_i instr_i*}*{instr*} (label_m{} B^l[ catch_br{l} (P[val^n (throw a)]) end] end) end
  --> caught_m{a val^n}{instr'*} (label_m{} B^l[ P[throw_point] ] end) end

Something is missing here, I think: the outer catch first needs to check if it actually handles the exception. The instr'* on the rhs is coming out of nowhere, but should be the matched handler. Moreover, a catch_br could also target another catch_br, according to the typing rules, so something inductive is required.

There also does not seem to be a rule for the case where a catch does not handle a thrown exception, and it instead has to find an outer handler.

I suspect this all will get extremely complicated (or even impossible) with the current approach. The only way I can think of to keep the rules tractable is by constructing the continuation incrementally. That could be done by reifying it as an evaluation context in the administrative throw instruction that gets extended. For example,

instr ::= ... | throw{E} a

F; throw $x  -->  F; throw{[]} a  (iff F_exn(x) = a)

catch_m{a_i instr_i*}*{instr*} (label_m{} B^l[ catch_br{l} T[val^n (throw{E} a)] end] end) end
  --> val^n (throw{caught_m{a val^n}{instr'*} (label_m{} B^l[ T[E] ] end) end} a)

S; F; catch_m{a_i instr_i*}*{instr'*} T[val^n (throw{E} a)] end
  -->  S; F; caught_m{a val^n}{val^n instr_i*} T[E[throw_point]] end
  (iff S_exn(a) = {type [t^n]->[]} and a_i = a)

and so on.

But in fact, if I read the answer to #130 correctly (I'm not sure I fully understand what the intended interaction is), then this could be simplified by not gathering contexts in the throw, but merely the sequence of unwind instructions to execute once the handler is found. That would also allow to get rid of the throw_point:

instr ::= ... | throw_n{instr*} a

S; F; val^n (throw $x)  -->  F; throw_n{} a  (iff F_exn(x) = a and S_exn(a) = {type [t^n]->[]})

unwind{instr'*} T[val^n (throw_n{instr*} a)] end
  --> val^n (throw_n{instr* instr'*} a)

catch_m{a instr*}*{instr'*}? (label_m{} B^l[ catch_br{l} T[val^n (throw_n{instr''*} a')] end] end) end
  --> catch_m{a instr*}*{instr'*}? val^n (throw_n{instr''*} a') end

catch_m{a instr*}*{instr'*} val^n (throw_n{instr''*} a') end
  --> instr''* (caught_m{a val^n} (instr*)*[i] end)
  (iff a* = a^i a' a*)

This looks simpler and would clearly encode the intention that the unwind phase follows the catches.

However, I'm not sure what is supposed to happen if an unwind block itself throws... Should the handler still be active, although it already caught? In that case, some additional wrapping around instr''* would be required in the last rule, and it may get complicated again.

F; caught_m{a val^n}{instr*} ( P[unwind{instr'*} T[throw_point] end]) end
  --> caught_m{a val^n}{instr*} P[instr'* throw_point] end

I believe the enclosing caught_m and P is redundant in this rule.

@ioannad
Copy link
Collaborator

ioannad commented Oct 6, 2020

Thank you for the feedback, @rossberg and @RossTate. Sorry for the delay, I was tied up until today (and btw had to miss yesterday's SG meeting as well).

@RossTate my understanding of rethrow is that it creates a new exception, that's why I specified that the rethrown exception is rethrown from the place of the rethrow instruction. Am I misunderstanding something?

@rossberg thank you so much for the detailed review! Some quick remarks before I start reworking the formal spec to incorporate your suggestions and comments.

Can you explain the role of the new P contexts and their distinction from T?

My intention was to distinguish the case when a unwind block is involved. Your suggestions and the conclusion of #130 might make these unnecessary though.

the reduction rules for throw would seem to allow ignoring intermediate catch_br, such that

(try $outer
 (try $inner
   (try
     (throw $e)
     (catch_br $outer)
   )
   (catch ...)  (1)
 )
 (catch ...)  (2)
)

would go to (1), not (2) as it should.

My understanding is that any instruction after a throw is dead code, is that wrong?

I really like your suggestion of the new administrative throw{E} a, this might simplify things a lot. I'll work through the semantics and update my comment when I'm ready with it.

val^n (try bt instr* (catch x_i instr_i*)* (catch_all instr'*)^k end)
  -->  catch_m{a_i instr_i*}*{instr''*} (label_m{} val^n instr* end) end
  (iff bt = [t1^n] -> [t2^m] and (if k=1 and instr'*=ε then instr''* = nop)
 and (if k=1 and inst'*=/=ε then instr''* = instr'*) and (if k=0 then instr''*=ε).

Can you explain the need for the complex side condition on this one?

This was an attempt to distinguish the case when there is no catch_all, wrt the rule

S; F; catch_m{a_i instr_i*}*{instr*} P[val^n (throw a)] end
  -->  S; F; caught_m{a val^n}{instr*} P[throw_point] end
  (iff S_exn(a) = {type [t^n]->[]} and instr =/= ε and for every i, a_i =/= a)

About throw_point, I think the conclusion of #130 makes it unnecessary anyway. I might be able to explain this better when I reworked the rules with your suggestions and my new understanding of the operational semantics.

@rossberg
Copy link
Member

rossberg commented Oct 6, 2020

the reduction rules for throw would seem to allow ignoring intermediate catch_br, such that

(try $outer
 (try $inner
   (try
     (throw $e)
     (catch_br $outer)
   )
   (catch ...)  (1)
 )
 (catch ...)  (2)
)

would go to (1), not (2) as it should.

My understanding is that any instruction after a throw is dead code, is that wrong?

No, but the catch_br here is part of the try, not an instruction after throw. (Ah, the text format actually requires writing this as (try (do (throw $e)) (catch ...)), which makes that clearer.)

@RossTate
Copy link
Contributor

RossTate commented Oct 6, 2020

@RossTate my understanding of rethrow is that it creates a new exception, that's why I specified that the rethrown exception is rethrown from the place of the rethrow instruction. Am I misunderstanding something?

@ioannad But in what context/continuation is that exception thrown? The answer affects how this example behaves: try (throw $exn) catch $exn (try rethrow catch $exn instr* end) end. The reasoning in #126 for removing the index only works if the continuation for the rethrown exception is whatever the continuation of the outer try/catch is rather than the continuation of the rethrow instructions itself.

@ioannad
Copy link
Collaborator

ioannad commented Oct 6, 2020

No, but the catch_br here is part of the try, not an instruction after throw. (Ah, the text format actually requires writing this as (try (do (throw $e)) (catch ...)), which makes that clearer.)

@rossberg oh, right, of course! I guess the notation and indentation threw me off.

But in what context/continuation is that exception thrown? The answer affects how this example behaves: try (throw $exn) catch $exn (try rethrow catch $exn instr* end) end. The reasoning in #126 for removing the index only works if the continuation for the rethrown exception is whatever the continuation of the outer try/catch is rather than the continuation of the rethrow instructions itself.

@RossTate I'm not sure I understand what reasoning you mean. I think the operational semantics I gave for rethrow are equivalent to the ones of the 2nd proposal. Is this not correct? I'm reworking the reduction steps anyway, but since rethrow is an already implemented instruction, I am reluctant to change its operational semantics, at least until #126 and #127 are decided.

@rossberg
Copy link
Member

rossberg commented Oct 6, 2020

Ah yes, in the presence of two-phase unwind (which effectively is a limited form of resumption), rethrow has to be more explicit about maintaining the original throw point. I think that semantics can only be modelled correctly if the throw context or unwind instructions is reified in the exception like suggested above, and also remembered in the caught form. It's similar to how the semantics for effect handlers works, except that the effect value is second-class.

I think something like

F; caught_m{a val^n}{E}{instr*} T[rethrow] end
  --> F; val^n (throw{E} a)

or similarly with instr* instead of E. (Quick answer for lack of time, might not be exactly right, but you probably see the idea.)

This is mostly unrelated to crippling the scoping capabilities of rethrow, though.

@RossTate
Copy link
Contributor

RossTate commented Oct 6, 2020

@rossberg It sounds like you are saying my example, try (throw $exn) catch $exn (try rethrow catch $exn instr* end) end, should reduce to throw $exn. Can you confirm?

@ioannad
Copy link
Collaborator

ioannad commented Oct 14, 2020

But as you said, you did not intend try-unwind as a primitive for finally, although I think @RossTate probably did.

Yes it looks like it, but that assumes a new instruction he proposed, something called unwinding branch. I was arguing, if he wants finally, adding finally directly would be more intuitive than adding his "unwind branch" that changes the semantics of unwind. (I'm not arguing we should add finally; I was arguing if someone really needs finally functionality, adding it directly might be better. But this is not the point of this comment anyway)

Thank you for the clarification, @aheejin, I am having a hard time understanding the discussion in #124. I will direct any further comments on finally/unwind in that issue instead.

Different people here are all talking about different points, so it's been confusing and it was hard to understand what exactly the problem related to the current spec is, if any.

I agree that mixing up the issues is very confusing. I'll post any further questions in their respective issues, and leave this issue for clarifications on the formal notation.

@rossberg
Copy link
Member

rossberg commented Oct 14, 2020

Thanks for the clarifications, @aheejin. I can see how this all came about. But I still feel rather uncomfortable about the general turn we have taken wrt hypothetical two-phase EH. My impression is that the current direction is based on multiple implicit assumptions that all are somewhat questionable, especially:

  1. Languages that expose two-phase EH (or other use cases) all share a sufficiently similar semantics that we can bake into Wasm. -- There hasn't been much evidence for that. @phoe's explanations about CL may suggest the opposite.

  2. Two-phase EH can only be implemented by the engine. -- I don't think that's correct. If you interpret throw as "unwind-to" and exception tags as target ids, then you can employ the current mechanism as the unwind phase and implement the handler search phase completely in user space with a simple shadow handler stack (see below).

The current direction OTOH seems to take us down a road where we end up building in special support for specific ad-hoc high-level language features. That is not aligned well with Wasm's goal of being a low-level and language-neutral engine.

Taking a step back, obviously, there has to be a primitive mechanism for non-local control transfer/abort, and basic throw/catch as in the current proposal can be viewed as just that. In a way, it's just a glorified and well-behaved longjmp with unwind. But given that, everything more complex seems better suited to be expressed in user space.

To wit, let me sketch how one could, quite easily, compile 2PEH with filters under the current proposal, without any use of try-unwind:

  1. Have the language's RTS define a (per-thread) table that represents its handler stack:

    (type $handler-func (func (param $exn i32)))
    (table $handlers (ref $handler-func))  ;; plain funcref would work as well
    (global $top+1 (mut i32) (i32.const 0))
    
    ;; These helpers would probably be inlined by the producer
    (func $push-handler (param $h (ref $handler-func))
      ;; omitting an overflow check here
      (table.set $handlers (global.get $top+1) (local.get $h))
      (global.set $top+1 (i32.add (global.get $top+1) (i32.const 1)))
    )
    (func $pop-handler
      (global.set $top+1 (i32.sub (global.get $top+1) (i32.const 1)))
    )
    
  2. And a helper function in the RTS to perform throws:

    (func $throw (param $exn i32)
      (local $i i32)
      (for $i in $top+1 - 1 to 0
        (call $pop-handler)
        ;; a handler that matches will not return
        (call_indirect $handlers (local.get $exn) (local.get $i))
      )
      (report uncaught exception)
    )
    
  3. Compile a source-level throw E to (call $throw (E)).

  4. Compile a source-level try A catch (e) when (C) B (with the catch, say, on line XYZ) to:

    (try
      (do
        (call $push-handler (ref.func $catch-XYZ))
        (A)
        (call $pop-handler)
      )
      (catch $unwind-to-XYZ
        (B)
      )
    )
    
  5. ...where a distinct exception tag and a handler function are generated for the catch:

    (exception $unwind-to-XYZ (param i32))  ;; assume i32 is representation of language exception
    (func $catch-XYZ (param $exn i32)
      (if (C)
        (then (throw $unwind-to-XYZ (local.get $exn))
      )
    )
    

This can easily be extended to multiple catch clauses etc. There is a small overhead here in that each try has to push/pop the shadow stack. But before we plan for lowering (potentially multiple semantic variations of) similar machinery into Wasm, my take is that there should be sufficient evidence that this overhead is practically relevant, for the minority of languages that need it.

@RossTate
Copy link
Contributor

I very much agree that point 1 is a misconception in the current proposal, but I disagree with point 2 (or more precisely, it's more nuanced, as I'll discuss below).

there has to be a primitive mechanism for non-local control transfer/abort, and basic throw/catch as in the current proposal can be viewed as just that.

I agree with this, though there's a missing piece, which is that you need a way to unwind, e.g. destructors, (part of) finally, fault, unwind-protect (and part of dynamic-wind). All of the latter are conceptually blocks/functions of type [] -> [].

In a way, it's just a glorified and well-behaved longjmp with unwind.

This is not true. throw/catch uses dynamic scoping whereas longjmp is more akin to lexical scoping (e.g. block/return-from in Common Lisp). Of course we can emulate the latter with the former provided we have a way to dynamically generate fresh identities, but it is important to recognize this distinction and that for probably a good while the latter will be emulated with the former. In particular, it means that not all uses of throw are for exceptions, which for example means compiling a C++ catch (...) to something (like catch_all) that intercepts them is problematic. In the meanwhile, we want both forms of non-local control flow to run unwinding code in the middle.


To wit, let me sketch how one could, quite easily, compile 2PEH with filters under the current proposal, without any use of try-unwind

There are multiple problems with this suggestion, many of which line up with the reasons why we aren't telling programs to implement exception handling with shadow stacks.

  1. Have the language's RTS define a (per-thread) table that represents its handler stack:

Here you're using a global table to encode thread-local state. According to WebAssembly/design#1375 (comment), this is "the (somewhat embarrassing) current hack we use for web workers, but it isn't the plan for a first-class (pure wasm) threads".

  1. And a helper function in the RTS to perform throws:

This helper function has a bug. It pops handlers during the search. Those handlers need to stay on in case, during unwinding after the search has completed, an unwinder throws an exception that should be handled by one of those handlers. (There is Common Lisp code that specifically relies on this behavior.)

  1. Compile a source-level try A catch (e) when (C) B (with the catch, say, on line XYZ) to:

This has two bugs. For the first bug, you need to pop the handler regardless of how control exits. That should include when foreign code causes non-local control to exit the block, which is covered by using try/unwind. Using try/unwind also fixes the bug above.

The second bug is that, if the handler function matches, then you need to make sure to go to this specific call frame's catcher (as there might be multiple frames for the same function on the stack). So you need to dynamically generate a unique identifier for the call frame and somehow store that in/with the pushed handler (so that it can be put in the event's payload), and similarly the catch needs to check that the identifier in the payload matches the current call frame's generated identifier. (This is not a problem for, say, C++ handlers. Though even then the payload should probably specify which of the compiled C++-catch clauses should be branched to.)

Note that in more advanced 2PEH languages/implementations, the handler would normally reference and even mutate values in the corresponding call frame. So these languages will require allocating a closure each time you push the handler, updating the closure every time a relevant value is mutated, and pulling changes from the closure when the handler is popped. This work is wasted if the handler is never invoked.

  1. ...where a distinct exception tag and a handler function are generated for the catch:

Note that a foreign catch_all while mistake these thrown events for unhandled exceptions.


Reviewing the above exercise, I see that the above implementation strategy needs try/unwind to interact correctly with foreign exceptions (and at the same time interact well with its own exceptions). I also see that a catch_all in foreign code (say representing a C++ catch (...)) would mistake a non-local transfer of control for an exception.

There are also issues with how it composes with other features in the pipeline. In particular, algebraic effects or first-class stacks provide ways to change the current stack and consequently change the handlers on the stack. This means that the global table can become out of touch with reality. I also identified a number of inefficiencies for more advanced 2PEH language implementations caused by trying to maintain this table as a shadow of the real stack. All these problems would be addressed by actually using the real stack to implement 2PEH rather than some attempted portrait of the stack.

@rossberg
Copy link
Member

I agree with this, though there's a missing piece, which is that you need a way to unwind, e.g. destructors, (part of) finally, fault, unwind-protect (and part of dynamic-wind).

My whole point is that these might not be "missing", because they do not necessarily need to be primitive in a low-level VM. Anything that can be done in user space without significant loss ought to be done there.

throw/catch uses dynamic scoping whereas longjmp is more akin to lexical scoping

I don't see that. A longjmp's target is first-class and selected completely dynamically, not lexically.

  1. Have the language's RTS define a (per-thread) table that represents its handler stack:

Here you're using a global table to encode thread-local state. According to WebAssembly/design#1375 (comment), this is "the (somewhat embarrassing) current hack we use for web workers, but it isn't the plan for a first-class (pure wasm) threads".

I think you are misrepresenting @lukewagner's comment. He was referring to thread-local state being implicitly identified with instance state right now, because there's no fork and no explicit sharing annotation, not to using thread-local state at all -- of course RTSs will need TLS for certain things, e.g., to maintain the shadow stack for C. In any case, this question seems completely orthogonal.

  1. And a helper function in the RTS to perform throws:

This helper function has a bug. It pops handlers during the search. Those handlers need to stay on in case, during unwinding after the search has completed, an unwinder throws an exception that should be handled by one of those handlers. (There is Common Lisp code that specifically relies on this behavior.)

You are right, I was prematurely optimising code size by moving the pop from the catch to $throw while writing down the sketch. I hope we can agree that this is easily fixed.

  1. Compile a source-level try A catch (e) when (C) B (with the catch, say, on line XYZ) to:

This has two bugs. For the first bug, you need to pop the handler regardless of how control exits. That should include when foreign code causes non-local control to exit the block, which is covered by using try/unwind. Using try/unwind also fixes the bug above.

Remember this was just a sketch. I agree that to interact cleanly with foreign code you'll need to run pop in a finalizer, but catch-all/rethrow will do just fine.

The second bug is that, if the handler function matches, then you need to make sure to go to this specific call frame's catcher (as there might be multiple frames for the same function on the stack). So you need to dynamically generate a unique identifier for the call frame and somehow store that in/with the pushed handler (so that it can be put in the event's payload), and similarly the catch needs to check that the identifier in the payload matches the current call frame's generated identifier.

I think that's not true. For a given catch, the innermost instance of handler in the shadow stack will correspond to the innermost handler on the Wasm stack by construction. I don't see why anything else would be needed.

Note that in more advanced 2PEH languages/implementations, the handler would normally reference and even mutate values in the corresponding call frame. So these languages will require allocating a closure each time you push the handler, updating the closure every time a relevant value is mutated, and pulling changes from the closure when the handler is popped. This work is wasted if the handler is never invoked.

Well, this relates directly to the problem I keep pointing out with your vision for an engine solution: the implicit assumption that an engine could execute code in a call frame that is not on top of the stack. In most engines I have seen that won't work without implementing fragmented stack frames, for reasons I have pointed out several times. I highly doubt fragmented stack frames are something engine makers would be willing to accept -- it seems like a bad idea for Wasm. So I believe that any built-in solution will almost inevitably have the exact same limitation.

  1. ...where a distinct exception tag and a handler function are generated for the catch:

Note that a foreign catch_all while mistake these thrown events for unhandled exceptions.

Such a catch-all will run in the 2nd phase, where it seems like it would evoke exactly the behaviour it should? If it rethrows, e.g., because it encodes a finally, then unwinding continues as one would expect. If it swallows (a.k.a. handles) the exception, that seems fine as well, the outer handlers will remain on the shadow stack and thus active (provided my mistake above is corrected).

Reviewing the above exercise, I see that the above implementation strategy needs try/unwind to interact correctly with foreign exceptions (and at the same time interact well with its own exceptions). I also see that a catch_all in foreign code (say representing a C++ catch (...)) would mistake a non-local transfer of control for an exception.

See above, there may well be other potential issues I'm missing right now, but this probably isn't one of them.

There are also issues with how it composes with other features in the pipeline. In particular, algebraic effects or first-class stacks provide ways to change the current stack and consequently change the handlers on the stack. This means that the global table can become out of touch with reality. I also identified a number of inefficiencies for more advanced 2PEH language implementations caused by trying to maintain this table as a shadow of the real stack. All these problems would be addressed by actually using the real stack to implement 2PEH rather than some attempted portrait of the stack.

Interaction with continuations is the main concern here, I agree. This one I don't have a good answer for right now -- though I do point out that this doesn't represent a new problem, we'll already need to solve that somehow for other existing uses of shadow stack, as in C.

OTOH, this mirrors my dual concern with baking ad-hoc mechanisms into the language that risk ending up with a hacky combined semantics if they lack a proper foundation.

@aheejin
Copy link
Member

aheejin commented Oct 15, 2020

@rossberg Thanks for your writeup. What you wrote is very much like what I thought a VM would do for two-phase internally, and I don't really have a preference whether it should be done in the VM side or in the user side, as long as it shows reasonable performance. But I'm still not sure how that handles cleanup code.

  • I might have missed this, but I don't see how your scheme distinguishes user catch handler and cleanup code. Is it via a different tag? Is there a special "cleanup" tag or something so we ignore that tag when searching?
  • In your scheme, let's say we searched the handler stack and found a user catch handler somewhere. Then we have to actually unwind the stack (= second phase), while running cleanup code. After running a part of cleanup code, how do we resume the second phase? As I said, rethrow initiates a new search, which we don't want. We want resuming functionality. (We can have a separate resume instruction, which I also mentioned)
  • Your scheme lacks the functionality of delegate, but I think this can be added by tweaking the throw function code, such as, not decrementing the loop index by 1 or by N.

@rossberg
Copy link
Member

@aheejin:

  • I might have missed this, but I don't see how your scheme distinguishes user catch handler and cleanup code. Is it via a different tag? Is there a special "cleanup" tag or something so we ignore that tag when searching?

Clean-up code would compile to a plain try-catch_all-rethrow without a counterpart on the shadow handler stack. So it runs as expected when a throw $unwind-to-XYZ is invoked from that stack.

  • In your scheme, let's say we searched the handler stack and found a user catch handler somewhere. Then we have to actually unwind the stack (= second phase), while running cleanup code. After running a part of cleanup code, how do we resume the second phase? As I said, rethrow initiates a new search, which we don't want. We want resuming functionality. (We can have a separate resume instruction, which I also mentioned)

The observation underlying this scheme is that 2PEH should not be understood as 1PEH + a separate unwind phase afterwards, but as 1PEH + a separate search phase before. Throw/catch becomes the unwinding primitive, not the search primitive.

So, rethrow does exactly what you want here, because throw/catch is only used to implement the unwinding phase (after search has already been completed). And rethrow resumes that. Actual source-level throw is implemented as a more high-level operation via the $throw function I showed. And if the source language had something akin to rethrow itself, then that would equally be implemented by a higher-level abstraction that invokes the search phase first explicitly.

  • Your scheme lacks the functionality of delegate, but I think this can be added by tweaking the throw function code, such as, not decrementing the loop index by 1 or by N.

Hm, I believe that's orthogonal. While I only showed a source-level try compiling to try-catch, it can likewise be compiled to try-delegate when needed. Whether it's catch or delegate is a handler-side detail that does not affect the search and unwind mechanism leading up to it (or being resumed from it). You only have to ensure that you pop the necessary shadow handlers if delegation skips over respective handlers (which would be simpler if we replaced try-delegate with a more low-level rethrow_in, as suggested earlier).

@RossTate
Copy link
Contributor

@rossberg All high-level and low-level systems with 2PEH (that I know of) have an unwind-like construct. But you claimed that this feature that was added to make programs compose better with each other and with future programs using future extensions is unnecessary. To back this claim, you gave an implementation strategy for 2PEH. But that implementation strategy does not compose well with other programs or with future extensions (specifically extensions that you are advocating for). Regardless of your concerns about how I have suggested we might support (composable) 2PEH in the future, there seems to be significant evidence that unwind will be helpful for any way we might support (composable) 2PEH in the future.

@aheejin
Copy link
Member

aheejin commented Oct 15, 2020

@rossberg

So, rethrow does exactly what you want here, because throw/catch is only used to implement the unwinding phase (after search has already been completed). And rethrow resumes that. Actual source-level throw is implemented as a more high-level operation via the $throw function I showed. And if the source language had something akin to rethrow itself, then that would equally be implemented by a higher-level abstraction that invokes the search phase first explicitly.

I still don't understand how your scheme support cleanup code in the first phase. In the first phase, we should search for a matching handler. This shouldn't include cleanup code. If there's no matching handler, the program should crash without unwinding the stack.

You also don't like an option that filter functions in catch requires three possible values (1. catch handler + my exception, 2. catch handler + not my exception, 3. cleanup). Then how is the first search phase supposed to figure out a given catch is cleanup code or not? If it is cleanup code, the first phase should just skip it. We shouldn't catch an exception and unwind the stack to the point of the cleanup code, if there's no user catch handler somewhere up in the stack.

Or, do you think we should delegate everything to user space so each handler does its own thing, including saying "I'm cleanup code" "I'm a user handler" or something?


Also you are suggesting rethrow should be the same as resume, which is confusing. In my understanding, rethrow initiates a new two-phase search, and resume resumes the (already ongoing) second phase search. If we change rethrow to mean resume, can other languages, that has rethrow like construct, use it for their purpose?


If your scheme can answer both of these problems, I actually support its removal.

@RossTate
Copy link
Contributor

For a given catch, the innermost instance of handler in the shadow stack will correspond to the innermost handler on the Wasm stack by construction. I don't see why anything else would be needed.

@rossberg It is important to be aware that this is not true. Only languages that do not strictly speak need 2PEH can have this property, and even then not all such languages do. As an example of this latter subset, in Python the kind of exception that is caught by a given except can be a dynamically determined value, and so different call frames for the same function can catch different exceptions. And for languages that do need 2PEH, the "filter" function (which is a misnomer) can have context/state-sensitive code.

This is why the macro-implementation of Common Lisp's handler-case (its version of filter-by-exception-tag handling) uses block/return-from, i.e. lexically scoped non-local control. The use of lexical scoping makes sure the unwinding phase proceeds up to the specific call frame on the stack already determined by the search phase, with no need to distinguish the chosen stack frame from other call frames of the same function.

Of course, we are not adding such a construct at the moment, and maybe we never will. Or maybe we will design a completely different way to do 2PEH. We just wanted to make room for such extensions, and the unwind concept has been demonstrated to compose well with a wide range of control constructs.

@rossberg
Copy link
Member

@aheejin:

I still don't understand how your scheme support cleanup code in the first phase. In the first phase, we should search for a matching handler. This shouldn't include cleanup code. If there's no matching handler, the program should crash without unwinding the stack.

The first phase searches the shadow handler table and does not execute anything. Clean-up code is not in that table, so isn't taken into account.

You also don't like an option that filter functions in catch requires three possible values (1. catch handler + my exception, 2. catch handler + not my exception, 3. cleanup). Then how is the first search phase supposed to figure out a given catch is cleanup code or not?

As I said, clean-up code isn't even considered in the search. IOW, it is implemented differently from handlers (by the compiler) and no dynamic differentiation is needed. So catch and unwind are different just as they would be as Wasm constructs, except that the distinction is fully implemented in user code.

Also you are suggesting rethrow should be the same as resume, which is confusing. In my understanding, rethrow initiates a new two-phase search, and resume resumes the (already ongoing) second phase search. If we change rethrow to mean resume, can other languages, that has rethrow like construct, use it for their purpose?

Oh, I am not proposing to change anything about rethrow. I am demonstrating that we would not need to change anything to implement 2PEH. Rethrow already does what it takes. I am showing a producer-side implementation technique that would use it in a certain way (and that may be non-obvious if one only thinks about 2PEH in a particular way). And interacts correctly with another 1PEH language that uses rethrow in the conventional way.

There is one caveat, though, and that might be what you are getting at: if the 2PEH source language has a source-level rethrow itself, then that would compile to a regular Wasm-level throw under this scheme. So it would only include the stack trace from the rethrow point. But the combination of 2PEH and rethrow is rare enough and the loss small enough that it might be considered okay?

@RossTate:

For a given catch, the innermost instance of handler in the shadow stack will correspond to the innermost handler on the Wasm stack by construction. I don't see why anything else would be needed.

@rossberg It is important to be aware that this is not true.

Huh? Handlers on the real stack have a 1-to-1 correspondence with handlers on the shadow stack. That's the very definition of a shadow stack. If there are multiple invocations of the same function with a handler then there are multiple entries for that handler on the shadow stack, in the same order. That is all you need.

You seem to be conflating this with a completely separate problem, which is that some languages might require those entries to have access to some context from the handler's enclosing function. In such a setting, you have to program up the context closure, either by using Wasm closures (func.bind) as filters, or short of that, the good old way of threading through a context parameter manually. The same would be true for built-in 2PEH filters, because even then a filter cannot (realistically) be expected to be runnable in the function's frame, as I have pointed out various times already.

@RossTate
Copy link
Contributor

Handlers on the real stack have a 1-to-1 correspondence with handlers on the shadow stack. That's the very definition of a shadow stack. If there are multiple invocations of the same function with a handler then there are multiple entries for that handler on the shadow stack, in the same order. That is all you need.

In your sketch, every handler has a corresponding catch, but your sketch does not have the property that the throw in a handler gets caught by its corresponding catch. That is, your sketch has a bug caused by using dynamically scoped non-local control rather than lexically scoped non-local control as other implementations of 2PEH use. I understand that it is a just a sketch, but the issue is that the sketch oversimplifies the issues and therefore makes particular solutions seem viable even though those solutions would not scale to real uses of 2PEH.

because even then a filter cannot (realistically) be expected to be runnable in the function's frame, as I have pointed out various times already.

You have said this many times, and yet it is not in line with what implementers said in other discussions. Regardless, unwind does not force us to commit to anything, whereas you seem to be arguing that we should commit to not supporting any extensions.

@rossberg
Copy link
Member

In your sketch, every handler has a corresponding catch, but your sketch does not have the property that the throw in a handler gets caught by its corresponding catch. That is, your sketch has a bug caused by using dynamically scoped non-local control rather than lexically scoped non-local control as other implementations of 2PEH use.

EH control transfer is non-lexical by nature. Sorry, I don't see how there is a bug of the kind you are claiming. Can you provide an example of a throw that you think would go wrong?

because even then a filter cannot (realistically) be expected to be runnable in the function's frame, as I have pointed out various times already.

You have said this many times, and yet it is not in line with what implementers said in other discussions.

I don't know who you talked to and what you asked them, but I have worked on some of V8's calling conventions myself, and hence can tell you with some confidence that this will not fly without fundamental changes to its stack usage and assumptions about it, and likely affecting performance. I can't speak for other engines, but I would be very surprised if it worked seamlessly for them.

Regardless, unwind does not force us to commit to anything, whereas you seem to be arguing that we should commit to not supporting any extensions.

I'm arguing that we need to support features in a non-additive manner and that there are too many premature assumptions right now.

@RossTate
Copy link
Contributor

EH control transfer is non-lexical by nature.

2PEH has two phases. The search phase is non-lexical by nature—you are searching the stack for a matching handler. But the unwinding phase is lexical—the matching handler indicates where to transfer control to (after unwinding). As I mentioned, this is explicit in Common Lisp, which desugars handler-case to a handler-bind where the handler function checks the appropriate tags on the exception and then, upon finding a match, uses return-from to jump (after unwinding) to the appropriate catch body.

Sorry, I don't see how there is a bug of the kind you are claiming. Can you provide an example of a throw that you think would go wrong?

(exception $unwind-to-XYZ (param i32))  ;; assume i32 is representation of language exception
(func $catch-XYZ (param $exn i32)
  (if (C)
    (then (throw $unwind-to-XYZ (local.get $exn))
  )
)

Suppose C is stateful. For example, it mutably flips the bit of some global boolean variable and checks if the result is true. If we have two such handlers on the shadow stack and the global boolean variable is initially true, the innermost handler will flip the bit and not throw, and then the outermost handler will flip the bit and throw. But then your innermost catch will catch the exception, whereas control was supposed to go to the outermost catch.

@rossberg
Copy link
Member

Ah, okay, excellent point. Thanks! I stand corrected.

Of course, it wouldn't be hard to work around that by using a secondary dynamic token and compare that in the handler. But I admit that reduces the appeal of that solution. A more elegant solution would be possible if we removed the (somewhat artificial) limitation of static exception tag allocation in the EH proposal, which might be desirable for other things as well.

@RossTate
Copy link
Contributor

Thanks for helping me find the right illustrative example. Did that also help convey to you the rationale of unwind then?

@rossberg
Copy link
Member

Yeah, I'm still uncomfortable with committing to ad-hoc solutions prematurely and without a strategy that sound sounds different from "let's built in every feature". Dynamic tag generation would be a more orthogonal and principled solution to the particular problem you pointed out. OTOH, the potential harm of unwind is hopefully limited, so I won't die on this hill. But I do remain worried about the path this approach puts us on.

@RossTate
Copy link
Contributor

At present the proposal has dynamically scoped local and non-local control but only lexically scoped local, so it seems natural to complete the square with lexically scoped non-local control. Regarding principles, lexically scoped non-local control has stronger parametricity properties. It also supports more efficient (in terms of order of complexity) non-local control transfers because there is no need to search. Dynamic tag generation can emulate lexically scoped non-local control but with weaker guarantees and poorer performance, and the current proposal can already emulate dynamic tag generation. As for the path, completing the square seems to cover the single-stack control space very well.

@aheejin
Copy link
Member

aheejin commented Nov 4, 2020

I'm not very sure why we are discussing lexical vs. dynamically scoping here. Do we plan to support any dynamically scoped language? Is wasm a dynamically scoped model? (I think no)

I share @rossberg's concern that, even if we need a functionality similar to unwind in the follow-on 2PEH proposal, given that it does not have a meaningful difference with catch_all now, adding it in the current proposal seems premature. We don't know what the follow-on 2PEH proposal will look like, and adding unwind there looks OK to me.

@ioannad
Copy link
Collaborator

ioannad commented Nov 4, 2020

given that it [unwind] does not have a meaningful difference with catch_all now, adding it in the current proposal seems premature. We don't know what the follow-on 2PEH proposal will look like, and adding unwind there looks OK to me.

I agree with this. And as I feared in a comment to PR#137, trying to specify unwind is already leading to long discussions which probably belong in a follow-on 2PEH proposal. I think the point of changing the 2nd EH proposal to the current, was just to have it be compatible with a 2PEH follow-on proposal, not more.

ioannad added a commit to ioannad/exception-handling that referenced this issue Nov 10, 2020
This is an attempt to formally describe @aheejin's 3rd proposal, which she presented to the Wasm CG, and which was voted to be the new EH proposal, on September 18, 2020. This is not formal spec that I have developed, but a formal description of this 3rd proposal.

This is a reworked form of my [first attempt on this formal spec overview](WebAssembly#87 (comment)) edited with my new understanding of the spec based on the discussion below, and in other issues, and according to @aheejin 's [3rd proposal overview](https://github.com/WebAssembly/exception-handling/blob/f7a4f60d11fb6326fc13f84d3889b11d3873f08a/proposals/Exceptions.md) in PR WebAssembly#137.

This is in the form of a document as @tlively [requested](WebAssembly#142 (comment)), to make discussion on specific points easier.

I wrote this formal spec overview roughly in the style of the 2nd proposal's [formal spec overview](WebAssembly#87 (comment)) by @rossberg.

Particular points of interest:

- In this I assume `rethrow` does have an immediate, as it is now described in WebAssembly#137.
- The instruction `unwind` now reduces to `catch_all ... rethrow` as specified in Heejin's overview.
- Because unwind is much simpler in this MVP, there is no `throw_point` anymore and `caught_m` is much simpler.
- The introduction of `caught_m` now also introduces a label around the catch instructions, which label a rethrow instruction can reference.
- Added explanation of the peculiar side condition in try's execution rule - just trying to make the rules more compact.

I would be happy if anyone could point out things that are wrong or that I misunderstood.
@ioannad
Copy link
Collaborator

ioannad commented Nov 10, 2020

I just made PR #143 with a document attempting to describe the current spec, as @tlively requested.

@aheejin if you like the idea of keeping such a formal-overview.md document, perhaps we could close this issue by pointing to that PR? Or to the document when everyone agrees with it?

@aheejin
Copy link
Member

aheejin commented Nov 10, 2020

Yes, I think continuing discussions in a dedicated issue or PR is more desirable. This issue started as a question from someone else that if we had a formal spec at all, and got expanded to include a number of topics I can't even count.

@aheejin aheejin closed this as completed Nov 10, 2020
ioannad pushed a commit to ioannad/exception-handling that referenced this issue Feb 23, 2021
This PR adds the dependency to multi-value to the exception handling proposal text and to the README. 

I wrote an explanation of this dependency on the proposal text, but it's easier to see this once the verification and execution steps of `br_on_exn` and of `try` blocks are written out, as done [here](WebAssembly#87 (comment)) by @rossberg :

Validation:

```
ft = t1* -> t2*
C, label t2* |- e1* : t1* -> t2*
C, label t2* |- e2* : exnref -> t2*
-----------------------------------
C |- try ft e1* catch e2* end : ft

C_label(l) = C_exn(x) = t*
-------------------------------------
C |- br_on_exn l x : exnref -> exnref
```

Execution: 

```
v^n (try ft e1* catch e2* end)  -->  catch_m{e2*} (label_m{} v^n e1* end) end)
  (iff ft = t1^n -> t2^m)
S; F; catch_m{e*} T[v^n (throw a)] end  -->  S; F; label_m{} (exn a v^n) e* end
  (iff S_exn(a) = {typ t^n})

F; (exn a v*) (br_on_exn l x)  -->  F; v* (br l)
  (iff F_exn(x) = a)
```

Concerning the functionality of `try`-`catch` blocks, note especially the passing of `v^n` values into a `label_m{}`.
Concerning the functionality of `br_on_exn`, note especially the execution step resulting in a `br` instruction.
ioannad added a commit to ioannad/exception-handling that referenced this issue Feb 23, 2021
…bly#121)

Included in this PR:

- Detailed core formal spec additions aiming to fully implement:
  + the "informal formal" core spec laid out by Andreas @rossberg here: WebAssembly#87 (comment) and
  + the core spec outlined in the [proposal overview](https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md), while removing the mention of "events" everywhere but in the binary format ,
  + prose for the above in: syntax, validation, execution, binary format, text format, and the appendix, as well as an example for throw contexts.
- Travis-ci build status icon to `README.md`. The contents of this PR get built and deployed on my fork's github page successfully, using [the howto instructions](https://github.com/WebAssembly/proposals/blob/master/howto.md#setting-up-travis-and-github-page).

Not included in this PR:

- Neither the new values representing unresolved throw contexts, nor the additional rules for the execution contexts, which we discussed in WebAssembly#87, are added here. I plan to add these separately.
- No JS API/Web API changes.

Side comments:

- In [document/core/text/types.rst](https://ioannad.github.io/exception-handling/core/text/types.html#text-refedtype): I'm not sure whether `refedtype ::= ... | event ⇒ exnref` should have `event` there or something else (perhaps `exception`?) However, since currently the only events are exceptions, this is probably not really an issue.
- The "informal formal spec" defines the administrative instruction which represents an exception package as `exnref`, i.e.,  the same name as the type. I called this administrative instruction `ref.exn` instead for two reasons;  to match the style of `ref.null` and `ref.extern`, and to avoid ambiguity in the discussions.
- I removed multi-value mentions from `README.md` because multi-value is now part of the main spec. For the same reason I didn't add "multi-value" to the list of included proposals at the top of the core spec landing page.
ioannad added a commit to ioannad/exception-handling that referenced this issue Feb 23, 2021
…bly#121)

Included in this PR:

- Detailed core formal spec additions aiming to fully implement:
  + the "informal formal" core spec laid out by Andreas @rossberg here: WebAssembly#87 (comment) and
  + the core spec outlined in the [proposal overview](https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md), while removing the mention of "events" everywhere but in the binary format ,
  + prose for the above in: syntax, validation, execution, binary format, text format, and the appendix, as well as an example for throw contexts.
- Travis-ci build status icon to `README.md`. The contents of this PR get built and deployed on my fork's github page successfully, using [the howto instructions](https://github.com/WebAssembly/proposals/blob/master/howto.md#setting-up-travis-and-github-page).

Not included in this PR:

- Neither the new values representing unresolved throw contexts, nor the additional rules for the execution contexts, which we discussed in WebAssembly#87, are added here. I plan to add these separately.
- No JS API/Web API changes.

Side comments:

- In [document/core/text/types.rst](https://ioannad.github.io/exception-handling/core/text/types.html#text-refedtype): I'm not sure whether `refedtype ::= ... | event ⇒ exnref` should have `event` there or something else (perhaps `exception`?) However, since currently the only events are exceptions, this is probably not really an issue.
- The "informal formal spec" defines the administrative instruction which represents an exception package as `exnref`, i.e.,  the same name as the type. I called this administrative instruction `ref.exn` instead for two reasons;  to match the style of `ref.null` and `ref.extern`, and to avoid ambiguity in the discussions.
- I removed multi-value mentions from `README.md` because multi-value is now part of the main spec. For the same reason I didn't add "multi-value" to the list of included proposals at the top of the core spec landing page.
ioannad added a commit to ioannad/exception-handling that referenced this issue Mar 12, 2021
This is an attempt to formally describe @aheejin's 3rd proposal, which she presented to the Wasm CG, and which was voted to be the new EH proposal, on September 18, 2020. This is not formal spec that I have developed, but a formal description of this 3rd proposal.

This is a reworked form of my [first attempt on this formal spec overview](WebAssembly#87 (comment)) edited with my new understanding of the spec based on the discussion below, and in other issues, and according to @aheejin 's [3rd proposal overview](https://github.com/WebAssembly/exception-handling/blob/f7a4f60d11fb6326fc13f84d3889b11d3873f08a/proposals/Exceptions.md) in PR WebAssembly#137.

This is in the form of a document as @tlively [requested](WebAssembly#142 (comment)), to make discussion on specific points easier.

I wrote this formal spec overview roughly in the style of the 2nd proposal's [formal spec overview](WebAssembly#87 (comment)) by @rossberg.

Particular points of interest:

- In this I assume `rethrow` does have an immediate, as it is now described in WebAssembly#137.
- The instruction `unwind` now reduces to `catch_all ... rethrow` as specified in Heejin's overview.
- Because unwind is much simpler in this MVP, there is no `throw_point` anymore and `caught_m` is much simpler.
- The introduction of `caught_m` now also introduces a label around the catch instructions, which label a rethrow instruction can reference.
- Added explanation of the peculiar side condition in try's execution rule - just trying to make the rules more compact.

I would be happy if anyone could point out things that are wrong or that I misunderstood.
ioannad added a commit to ioannad/exception-handling that referenced this issue Jun 25, 2021
This is an attempt to formally describe @aheejin's 3rd proposal, which she presented to the Wasm CG, and which was voted to be the new EH proposal, on September 18, 2020. This is not formal spec that I have developed, but a formal description of this 3rd proposal.

This is a reworked form of my [first attempt on this formal spec overview](WebAssembly#87 (comment)) edited with my new understanding of the spec based on the discussion below, and in other issues, and according to @aheejin 's [3rd proposal overview](https://github.com/WebAssembly/exception-handling/blob/f7a4f60d11fb6326fc13f84d3889b11d3873f08a/proposals/Exceptions.md) in PR WebAssembly#137.

This is in the form of a document as @tlively [requested](WebAssembly#142 (comment)), to make discussion on specific points easier.

I wrote this formal spec overview roughly in the style of the 2nd proposal's [formal spec overview](WebAssembly#87 (comment)) by @rossberg.

Particular points of interest:

- In this I assume `rethrow` does have an immediate, as it is now described in WebAssembly#137.
- The instruction `unwind` now reduces to `catch_all ... rethrow` as specified in Heejin's overview.
- Because unwind is much simpler in this MVP, there is no `throw_point` anymore and `caught_m` is much simpler.
- The introduction of `caught_m` now also introduces a label around the catch instructions, which label a rethrow instruction can reference.
- Added explanation of the peculiar side condition in try's execution rule - just trying to make the rules more compact.

I would be happy if anyone could point out things that are wrong or that I misunderstood.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants