Assert everything #14

lukewagner · 2015-08-18T22:51:13Z

This patch switches all the .wasm tests to asserteq instead of simply invoking.

lukewagner · 2015-08-19T03:36:58Z

With this, everything is asserted which makes it nice to run all the tests (src/wasm test/*.ml) and have silence mean "success". For this reason, I added a second commit to flip the default of Flags.print_sig to false. There is still 1 value printed by one remaining Invoke which actually fails if I try to asserteq it :) That's for a separate PR, though.

rossberg · 2015-08-19T13:24:00Z

Hm, this seems to break kg's recent test runner change, which checks the output of tests.

We probably need to agree on one mechanism for verifying tests. Should we use internal asserts or should we use external output validation? I'm fine either way, but two competing (and incompatible) approaches are a problem.

@kg, WDYT?

kg · 2015-08-19T14:26:25Z

I'm fine with internal assertions for general use but they don't cover all scenarios. I discussed this with luke on IRC. For things like testing trap behavior on failure cases, validation, etc we will need to have an external test harness that checks the result of running the interpreter.

We could write all this testing infrastructure in ocaml and wrap it around the prototype, but that introduces some nasty ecosystem issues I think we should avoid. (I'm not particularly attached to python unittest, but we definitely want to use an existing production-quality testing framework.)

Luke pointed out that he wanted the python test runner to be quiet, also, which I agree with. I can make some minor changes to do that today.

rossberg · 2015-08-19T14:53:49Z

On 19 August 2015 at 16:26, Katelyn Gadd notifications@github.com wrote:

I'm fine with internal assertions for general use but they don't cover all
scenarios. I discussed this with luke on IRC. For things like testing trap
behavior on failure cases, validation, etc we will need to have an external
test harness that checks the result of running the interpreter.

Makes sense. The change probably still needs to adjust the output
expectations, though.

We could write all this testing infrastructure in ocaml and wrap it around
the prototype, but that introduces some nasty ecosystem issues I think we
should avoid. (I'm not particularly attached to python unittest, but we
definitely want to use an existing production-quality testing framework.)

Yes, I don't think it's necessary to do that in Ocaml. I actually prefer to
have text as a solid abstraction boundary. :)

Luke pointed out that he wanted the python test runner to be quiet, also,
which I agree with. I can make some minor changes to do that today.

Do you think it's also possible to modify the runner such that it doesn't
invoke ocamlbuild by default? For the make users among us (ocamlbuild
rejects build artefacts in src, so I always have to make distclean first).

kg · 2015-08-19T14:59:05Z

Absolutely. If you want I can take that out. I wanted to avoid the scenario where somebody makes a bunch of changes, runs tests, sees them pass (because they forgot to build) and checks in. But we can solve that with CI later if it happens a lot.

lukewagner · 2015-08-19T15:51:39Z

@kg All of those conditions (traps, validation failures, etc) can be tested within .wasm files by adding new testing primitives to the scripting language. The only use case I see for external tool validation is testing the scripting primitives themselves (making sure they fail properly). For reference, this is what we've done in the SM test suite by adding shell-only builtins; the test harness just uses the process result code to determine pass/fail.

lukewagner · 2015-08-19T15:55:50Z

I meant to give motivation for internal assertions:

single file
if you update a test, you don't have to update the output file
more flexible if we want to extend the command language (e.g., in SM we'll often exhaustively test a cartesian product of possibilities by running a test in a loop, instead of manually duplicating all the cases)

lukewagner · 2015-08-21T04:47:35Z

As mentioned in #17, I'd like to write some negative-validation tests and I think these tests will be more robust and easier to write by adding a new inline assertion rather than matching the error-string output (which will change over time and require a new file for each individual case I want to test). Can we agree to go the inline-test route?

rossberg · 2015-08-21T06:03:25Z

I'm fine either way. @kg?

But can you please adjust the existing test expectations to this change?

kg · 2015-08-21T18:11:24Z

I'm OK with basic assertions in the wasm but I stand firmly by my opinion that we should not put complex test logic in the wasm interpreter, at least not yet. Tests inevitably need to do things like substring matches on error messages, handle traps, handle out-of-memory conditions, etc.

Once wasm itself is expressive and robust enough to implement all these tests correctly (in a way that doesn't produce silent failure, undebuggable hangs or undebuggable crashes) it will make sense to move everything out of runtests.py into some wasm code and self-host. But not today.

titzer · 2015-08-21T21:21:10Z

I think declarative inputs to tests are best. E.g. main(0, 1) = 3,
main(1,0)=!trap. Otherwise you will end up with feature madness in the test
runner and end up with a complex DSL just to run anthing; it's a nightmare
trying to debug the DSL that generates inputs to a test. We should favor
tests that have most a few dozen input cases.

One example:
https://code.google.com/p/virgil/source/browse/test/execute/alloc_array00.v3

And also negative validation tests and trapping tests can easily be
expressed in a single line like:
https://code.google.com/p/virgil/source/browse/test/seman/top_def05.v3

On Fri, Aug 21, 2015 at 8:11 PM, Katelyn Gadd notifications@github.com
wrote:

I'm OK with basic assertions in the wasm but I stand firmly by my opinion
that we should not put complex test logic in the wasm interpreter, at least
not yet. Tests inevitably need to do things like substring matches on error
messages, handle traps, handle out-of-memory conditions, etc.

Once wasm itself is expressive and robust enough to implement all these
tests correctly (in a way that doesn't produce silent failure, undebuggable
hangs or undebuggable crashes) it will make sense to move everything out of
runtests.py into some wasm code and self-host. But not today.

—
Reply to this email directly or view it on GitHub
#14 (comment).

lukewagner · 2015-08-22T00:08:21Z

@kg I don't see what "complex test logic" you're referring to; what I'm talking about is just more basic Script.commands to assert things like negative validation and faulting. Also, none of this logic is in the wasm interpreter (check.ml, eval.ml), it's in the script harness (script.ml).

@titzer I don't think anyone is suggesting a DSL here. The current tiny set Script.commands extended with commands to assert non-validation and faulting should be basically equivalent to the virgil tests you linked to (just different syntax and you can have a sequence of them). Or maybe you're already agreeing?

titzer · 2015-08-22T05:35:54Z

On Sat, Aug 22, 2015 at 2:08 AM, Luke Wagner notifications@github.com
wrote:

@kg https://github.com/kg I don't see what "complex test logic" you're
referring to; what I'm talking about is just more basic Script.commands
to assert things like negative validation and faulting. Also, none of this
logic is in the wasm interpreter (check.ml, eval.ml), it's in the
script harness (script.ml).

@titzer https://github.com/titzer I don't think anyone is suggesting a
DSL here. The current tiny set Script.command
https://github.com/WebAssembly/spec/blob/master/ml-proto/src/script.mli#L6s
extended with commands to assert non-validation and faulting should be
basically equivalent to the virgil tests you linked to (just different
syntax and you can have a sequence of them). Or maybe you're already
agreeing?

Having script commands (e.g. assert) is not declarative, and yes, scripts
are a DSL, embedded in s-expressions. Tests should not be able to control
their own assertions for exactly the same reason that benchmarks should not
time themselves. Test should not have to generate their own inputs with set
up code; instead, they should specify inputs and expected results.

—
Reply to this email directly or view it on GitHub
#14 (comment).

rossberg · 2015-08-22T08:40:01Z

@titzer, I'm not sure I get it. AFAIAC, the assert "command" here is declarative. And you seem to have a similar mini-assert-DSL in your Virgil tests, it is just hiding in comments. That looks pretty equivalent to me.

titzer · 2015-08-23T22:09:18Z

In the grammar it allows an "expr_list" as an argument to the assert eq.
That allows arbitrary computation, which is not at all declarative.

I am advocating that only inputs and outputs can be specified--i.e. no
scripting. Inputs should always be values, and outputs are either values,
validation error(s), or a trap.

On Sat, Aug 22, 2015 at 10:40 AM, rossberg-chromium <
notifications@github.com> wrote:

@titzer https://github.com/titzer, I'm not sure I get it. AFAIAC, the
assert "commands" here are declarative. And you seem to have a similar
micro-assert-DSL in your Virgil test, it is just hiding in comments. That
looks pretty equivalent to me.

—
Reply to this email directly or view it on GitHub
#14 (comment).

lukewagner · 2015-08-24T15:48:03Z

@titzer I expect exprs were just done for expedience. All the current uses just pass literals and it'd be just as easy to restrict it to a literal_list.

lukewagner · 2015-08-24T20:59:30Z

Anyhow, this PR doesn't take away any testing functionality, it just makes all the invokes assert their results; that shouldn't be too controversial. I'll make separate PRs to add faulting and negative validation commands.

rossberg · 2015-08-25T06:39:02Z

LGTM. Btw, was there a specific reason you left the "cast" invocation alone?

Assert everything

lukewagner · 2015-08-25T13:22:52Z

@rossberg-chromium Yes, because if I make it asserteq (and fix the definition of float in lexer.mll), the assertion fails even though the float_to_string values are the same. Int64.of_float confirms that the least-significant bits are different, so I was guessing this was just imprecision in float_to_string, but I meant to ask in a follow-up.

rossberg · 2015-08-25T13:36:01Z

I see, makes sense. Yeah, you'd probably need to string-convert with higher precision (e.g. using Printf.sprintf) to see the exact value to check against. But tbh, I'm not sure it's worth it. Maybe it's better to modify the test such that it produces a more well-defined result (e.g., only using integers, not floats).

* WIP on writing up alternate encoding * A little more work on the alternate proposal * Add encoding proposal that uses sign-extension operator * Update Overview.md * Fix table

Using `memory.init` or `memory.drop` on an active segment is a validation error, not a trap.

Fix spec link

Fix sentence structure.

Add explainer document for Typed Continuations proposal. Co-authored-by: Sam Lindley <Sam.Lindley@ed.ac.uk> Co-authored-by: Andreas Rossberg <rossberg@chromium.org>

…mbly#14)

It seems like a wrong copy-and-paste from Firefox.

Create Explainer.md

Add template for suggesting new instructions.

* Support for the Memory64 proposal in the spec interpreter * Apply suggestions from code review Co-authored-by: Andreas Rossberg <rossberg@mpi-sws.org> * Memory64 code review fixes * Update interpreter/syntax/types.ml Co-authored-by: Andreas Rossberg <rossberg@mpi-sws.org> Co-authored-by: Andreas Rossberg <rossberg@mpi-sws.org>

lukewagner force-pushed the assert-everything branch from cdd56fe to a3bd427 Compare August 19, 2015 17:41

lukewagner mentioned this pull request Aug 21, 2015

Allow multiple data segments #17

Merged

lukewagner added 2 commits August 24, 2015 15:00

Switch all the commented invokes to asserteqs

4cd24d6

Don't display signatures by default

56fd096

lukewagner force-pushed the assert-everything branch from a3bd427 to 2cd545a Compare August 24, 2015 20:47

Update expected-output to match

7027991

lukewagner force-pushed the assert-everything branch from 2cd545a to 7027991 Compare August 24, 2015 20:52

lukewagner mentioned this pull request Aug 25, 2015

Add invalid command #22

Merged

rossberg added a commit that referenced this pull request Aug 25, 2015

Merge pull request #14 from WebAssembly/assert-everything

c2afa23

Assert everything

rossberg merged commit c2afa23 into master Aug 25, 2015

rossberg deleted the assert-everything branch August 26, 2015 13:15

rossberg mentioned this pull request Aug 29, 2015

AssertEq seems to reject any operand other than invoke or const #34

Closed

eqrion pushed a commit to eqrion/wasm-spec that referenced this pull request Jul 18, 2019

Fix memory.{init,drop} mistakes in overview (WebAssembly#14)

9319fa7

Using `memory.init` or `memory.drop` on an active segment is a validation error, not a trap.

ErikMcClure pushed a commit to innative-sdk/spec that referenced this pull request Jun 15, 2020

Merge pull request WebAssembly#14 from WebAssembly/test

5f120e8

Fix spec link

rossberg referenced this pull request in effect-handlers/wasm-spec Feb 15, 2021

[interpreter] Add ref type and call_ref instruction (#14)

8c16975

dhil pushed a commit to dhil/webassembly-spec that referenced this pull request Mar 2, 2023

Merge pull request WebAssembly#14 from KarlSchimpf/fix

ddf3e5a

Fix sentence structure.

backes pushed a commit to backes/spec that referenced this pull request Jul 12, 2023

[spec] Cherry-pick text format change from bulk-ops proposal (WebAsse…

f806a56

…mbly#14)

rossberg pushed a commit that referenced this pull request Mar 7, 2024

Remove wabt "Done" link (#14)

fb4683b

It seems like a wrong copy-and-paste from Firefox.

dhil pushed a commit to dhil/webassembly-spec that referenced this pull request Apr 12, 2024

Merge pull request WebAssembly#14 from WebAssembly/fgmccabe-patch-2

5ab6f57

Create Explainer.md

rossberg pushed a commit that referenced this pull request Sep 4, 2024

Update issue templates (#14)

224627e

Add template for suggesting new instructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assert everything #14

Assert everything #14

lukewagner commented Aug 18, 2015

lukewagner commented Aug 19, 2015

rossberg commented Aug 19, 2015

kg commented Aug 19, 2015

rossberg commented Aug 19, 2015

kg commented Aug 19, 2015

lukewagner commented Aug 19, 2015

lukewagner commented Aug 19, 2015

lukewagner commented Aug 21, 2015

rossberg commented Aug 21, 2015

kg commented Aug 21, 2015

titzer commented Aug 21, 2015

lukewagner commented Aug 22, 2015

titzer commented Aug 22, 2015

rossberg commented Aug 22, 2015

titzer commented Aug 23, 2015

lukewagner commented Aug 24, 2015

lukewagner commented Aug 24, 2015

rossberg commented Aug 25, 2015

lukewagner commented Aug 25, 2015

rossberg commented Aug 25, 2015 via email

Assert everything #14

Assert everything #14

Conversation

lukewagner commented Aug 18, 2015

lukewagner commented Aug 19, 2015

rossberg commented Aug 19, 2015

kg commented Aug 19, 2015

rossberg commented Aug 19, 2015

kg commented Aug 19, 2015

lukewagner commented Aug 19, 2015

lukewagner commented Aug 19, 2015

lukewagner commented Aug 21, 2015

rossberg commented Aug 21, 2015

kg commented Aug 21, 2015

titzer commented Aug 21, 2015

lukewagner commented Aug 22, 2015

titzer commented Aug 22, 2015

rossberg commented Aug 22, 2015

titzer commented Aug 23, 2015

lukewagner commented Aug 24, 2015

lukewagner commented Aug 24, 2015

rossberg commented Aug 25, 2015

lukewagner commented Aug 25, 2015

rossberg commented Aug 25, 2015 via email