Testing strategy #5

ehuss · 2018-10-11T16:49:20Z

What will the testing strategy look like?

This is a pretty broad, but important topic. I'll just spew some random questions I had:

What corpus can be used?
How much should new, hand-written tests be made?
What types of tests? Pass/fail? Verifying error span/locations? Checking against expected AST/parse trees? Comparing against other rust parsers?
How can it be ensured the grammar doesn't fall too far behind rustc? Is there some way to test between them to quickly highlight divergence? (Maybe even running something in rust-lang/rust's CI to catch it at the PR stage?)
Depending on how slow the tests run, can they be segregated so that when experimenting with one part of the grammar (patterns for example), you can focus on subsets?
Should fuzzing be employed?

A few relevant links that I had lying around:

https://github.com/brson/ctrs
The old grammar bot: Automatically run grammar tests (grammar bot) rust#28592
Intellij's test suite: https://github.com/intellij-rust/intellij-rust/tree/master/src/test/resources/org/rust/lang/core/parser/fixtures

CAD97 · 2018-10-12T03:15:51Z

The most important thing according to @matklad (and I agree) is that tests should be as close to the definitions as possible. Our "unit" tests should be as close to the definition of the production that they're testing as possible.

(No comment on larger testing harnesses at this time.)

Centril · 2018-10-17T19:52:42Z

What corpus can be used?

Incrementally we can start with just the basics we know,
then we can move to rustcs test suite as the corpus,
finally and eventually we can use the same set as crater uses.

What types of tests? Pass/fail? Verifying error span/locations? Checking against expected AST/parse trees? Comparing against other rust parsers?

Using spans is probably not needed here.

But we should test beyond mere pass/fail; we need to ensure that the precedence and such things are interpreted right. So to do that we can convert the typed AST we make into syn & libsyntax's ASTs and then compare what syn & libsyntax parse things to. I think using syn is particularly neat here since the parser is quite modular.

How can it be ensured the grammar doesn't fall too far behind rustc?

Eventually, we can at least require patches to the grammar before stabilization of features.

We also need to include features and such so that we track rustc.

Should fuzzing be employed?

If by "fuzzing" you mean property based testing, then I think we should consider that eventually :)
We can generate programs given the canonical grammar and then test against libsyntax and syn.

matklad · 2018-10-19T09:19:09Z

So to do that we can convert the typed AST we make into syn & libsyntax's ASTs and then compare what syn & libsyntax parse things to.

Please don't make this the primary testing strategy. It's better to serialize the concrete/abstract syntax to some textual format (JSON, s-expressions, bespoke). That way, you don't depend on a particular AST library quirks, and make it much easier to use the test suite for other parsers.

Centril · 2018-10-19T09:28:37Z

@matklad oh sure; that also seems easier for rapid prototyping purposes? don't need to recompile so frequently?

matklad · 2018-10-19T09:31:27Z

Yep, and you can use difference crate to render the difference for three, and you can implement "show syntax tree of this code" in your editor of choice :)

Centril · 2018-10-19T09:41:02Z

@matklad of the serialization formats aforementioned, JSON seems like the most cost effective implementation-time wise...?

matklad · 2018-10-19T09:46:22Z

@Centril JSON definitely would be the least controversial, though I find a simple hand-rolled one to be preferable, due to better human-readability and diff ability:

struct S {
    foo: u32
}

ROOT@[0; 25)
  STRUCT_DEF@[0; 25)
    STRUCT_KW@[0; 6)
    WHITESPACE@[6; 7)
    NAME@[7; 8)
      IDENT@[7; 8) "S"
    WHITESPACE@[8; 9)
    NAMED_FIELD_DEF_LIST@[9; 25)
      L_CURLY@[9; 10)
      WHITESPACE@[10; 15)
      NAMED_FIELD_DEF@[15; 23)
        NAME@[15; 18)
          IDENT@[15; 18) "foo"
        COLON@[18; 19)
        WHITESPACE@[19; 20)
        PATH_TYPE@[20; 23)
          PATH@[20; 23)
            PATH_SEGMENT@[20; 23)
              NAME_REF@[20; 23)
                IDENT@[20; 23) "u32"
      WHITESPACE@[23; 24)
      R_CURLY@[24; 25)

Centril · 2018-10-19T09:59:53Z

Interesting; I think for JSON we might have:

{ "Root": {
   "StructDef": {
        "name": { "Ident": "S" },
        "fields": [ {
            "name": { "Ident": "foo" },
            "type": { "PathType": { } }
         } ]
     }
} }

(preserving whitespace and terminals is not something we want to do -- span info can be useful for debugging purposes)

matklad · 2018-10-19T10:33:14Z

preserving whitespace and terminals is not something we want to do

Curious, what is the reasoning behind that? Specifying a parse tree seems more natural than specifying the AST: you don't have to decide what shape the AST should have, you just specify what exists in concrete syntax.

It would be cool if the spec could distinguish foo::<> from foo<> from foo, which looks like it does require terminals.

Centril · 2018-10-19T10:38:42Z

Curious, what is the reasoning behind that?

At least, the canonical grammar as formalized in the GLL / BNF notation shouldn't reason or be explicit about whitespace and such things to make the grammar formalization simpler and readable. Similarly, we shouldn't care about error recovery and reporting.

If on the other hand, given the canonical grammar, you can still extract whitespace and terminal information from a generated parser, then by all means, I have no objections to that. :) But the work product of the WG should be the canonical grammar.

matklad · 2018-10-19T11:15:45Z

At least, the canonical grammar as formalized in the GLL / BNF notation shouldn't reason or be explicit about whitespace and such things to make the grammar formalization simpler and readable

Agree here, but let's unpack this: whitespace & comments are a concern of the lexical specification. And it's the job of the lexer to remove them from the resulting token sequence. Agree that it doesn't make much sense to specify whitespace & comments in the test files (I do that in rust-analyzer, just because I care about heuristic-based attachment of whitesapce to nodes), as long as offsets can be recovered.

However, terminals are very much a parser's concern, and should be reflected in the tests imo. From AST's pov, foo::<>, foo<> and foo are all equivalent, but it would be useful, at the speck level, to say that :: is a part of turbo-fish syntax, and not just drop it on the floor as a whitespace.

Similarly, we shouldn't care about error recovery and reporting.

Agree

Centril · 2018-10-19T11:30:30Z

whitespace & comments are a concern of the lexical specification. And it's the job of the lexer to remove them from the resulting token sequence.

Oh sure;

but it would be useful, at the speck level, to say that :: is a part of turbo-fish syntax, and not just drop it on the floor as a whitespace.

Hm... 🤔 -- it's not part of the abstract syntax of the language at least; i.e. when we want to talk about the type system and such things then it doesn't matter. But you should be able to talk about it as "turbo-fish" syntax in the grammar itself (and write comments in various places...) which is then displayed in a reasonable way in the spec .pdf.

eddyb · 2018-10-31T15:33:04Z

For the record, we're prototyping a Lua cross-checker (between gll-lua and @kyren's luster), and you can see the work in progress on the lua branch of gll.

It takes a parse forest from gll-lua and an AST from luster, and checks that there is at least one possible parse tree that matches the luster AST.

Note that the GLL (shared packed) parse forest is a DAG with maximal sharing and individual full-input parse trees are never instantiated. Their number is pretty much the possible combinations of ambiguous choices, which can get pretty high without extra disambiguation (and we don't have the infrastructure for that yet).

We can probably make this work with a JSON input for the non-GLL side (effectively "deserializing" on the fly), but to turn the GLL state into JSON (for external checking) would be just as hard, and would also require the external checker to understand ambiguity, on top of that.

Centril · 2018-10-31T20:59:10Z

From today's meeting: we'll start simple with pass/fail and mainly testing against syn/libsyntax to ensure compliance.

ehuss · 2019-08-21T18:06:44Z

I've been experimenting with a new strategy for the test suite and I wanted to see what people think. The general idea is to create a "conformance test suite". It would consist of:

A canonical tree defined in Rust that can be serialized/deserialized.
- Currently it is almost identical to syn which captures almost everything about the original source, and I think all parsers should be able to translate to (with varying degrees of completeness).
A large set of test files with sample code and the corresponding serialized tree output.
- The general structure would be similar to how tree-sitter's tests are defined, see below.
- I've been using RON since it is very readable and succinct, though the format isn't too important.
- Each test can be annotated with flags if it requires special handling (like nightly(specialization) or edition2015).
- Tests can use fragments of the grammar (though I haven't decided how granular).
A test runner library.
- This will have various options that can be set, such as ignoring certain things if the parser does not support it.
Implementations for syn, gll, libsyntax, and rust_analyzer.
- The gll test runner could check if at least one tree in the forest matches, reporting ambiguities and such (until we get disambiguation).
- Other parsers could fairly easily import the repo and use the test infrastructure.

A sample of what a test might look like is:

===========
hello world
===========

fn main() { println!("Hello, world!"); }

---

ItemFn(
    sig: FnSignature(
        fn_token: Fn(Span(0, 2)),
        name: Ident("main", Span(3, 7)),
        paren_tokens: (Paren(Span(7, 8)), Paren(Span(8, 9))),
    ),
    block: Block(
        brace_tokens: (Brace(Span(10, 11)), Brace(Span(39, 40))),
        stmts: [
            StmtSemi(
                expr: ExprMacro(
                    mac: Macro(
                        path: Path(
                            segments: [
                                PathSegment(
                                    ident: Ident("println", Span(12, 19)),
                                ),
                            ],
                        ),
                        bang_token: Bang(Span(19, 20)),
                        delimiters: (Paren(Span(20, 21)), Paren(Span(36, 37))),
                        tokens: [
                            LitStr("Hello, world!", Span(21, 36)),
                        ],
                    ),
                ),
                semi_token: Semi(Span(37, 38)),
            ),
        ],
    ),
    span: Span(0, 40),
)

Multiple tests can be placed in a single file, so a lot of the little variants should be easy to scan visually.

WDYT? I think it would be useful to ensure all parsers interpret the language the same, and it would replace the current debug-output tests for the gll grammar. A risk is that it may be difficult to translate between different tree structures (though in my cursory scans, should be possible). Another concern is that it will be another mountain of about 200 structs to define the tree, and then mountains of boring translation code for the different parsers, though I'm willing to put it together.

eddyb · 2019-08-29T09:07:11Z

@ehuss If we transition to the dynamically typed approach, you could probably work with a dynamic representation for the other libraries too, which might reduce effort dramatically.
(Presumably anything that supports serde would work)

Any differences in names/structure could be normalized away with little code, without having to do anything for the parts that are the same everywhere.

Centril · 2019-09-13T17:43:58Z

@ehuss seems pretty reasonable, possibly with @eddyb's amendment I agree that the RON format is readable also.

ehuss mentioned this issue Oct 11, 2018

Post a list of useful/interesting links? #8

Closed

steveklabnik mentioned this issue Oct 31, 2018

Automatically run grammar tests (grammar bot) rust-lang/rust#28592

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing strategy #5

Testing strategy #5

ehuss commented Oct 11, 2018

CAD97 commented Oct 12, 2018

Centril commented Oct 17, 2018

matklad commented Oct 19, 2018

Centril commented Oct 19, 2018

matklad commented Oct 19, 2018

Centril commented Oct 19, 2018

matklad commented Oct 19, 2018 •

edited

Loading

Centril commented Oct 19, 2018

matklad commented Oct 19, 2018

Centril commented Oct 19, 2018

matklad commented Oct 19, 2018

Centril commented Oct 19, 2018

eddyb commented Oct 31, 2018

Centril commented Oct 31, 2018

ehuss commented Aug 21, 2019

eddyb commented Aug 29, 2019

Centril commented Sep 13, 2019

Testing strategy #5

Testing strategy #5

Comments

ehuss commented Oct 11, 2018

CAD97 commented Oct 12, 2018

Centril commented Oct 17, 2018

matklad commented Oct 19, 2018

Centril commented Oct 19, 2018

matklad commented Oct 19, 2018

Centril commented Oct 19, 2018

matklad commented Oct 19, 2018 • edited Loading

Centril commented Oct 19, 2018

matklad commented Oct 19, 2018

Centril commented Oct 19, 2018

matklad commented Oct 19, 2018

Centril commented Oct 19, 2018

eddyb commented Oct 31, 2018

Centril commented Oct 31, 2018

ehuss commented Aug 21, 2019

eddyb commented Aug 29, 2019

Centril commented Sep 13, 2019

matklad commented Oct 19, 2018 •

edited

Loading