Skip to content
This repository has been archived by the owner on Apr 8, 2024. It is now read-only.

Testing strategy #5

Open
ehuss opened this issue Oct 11, 2018 · 17 comments
Open

Testing strategy #5

ehuss opened this issue Oct 11, 2018 · 17 comments

Comments

@ehuss
Copy link
Contributor

ehuss commented Oct 11, 2018

What will the testing strategy look like?

This is a pretty broad, but important topic. I'll just spew some random questions I had:

  • What corpus can be used?
  • How much should new, hand-written tests be made?
  • What types of tests? Pass/fail? Verifying error span/locations? Checking against expected AST/parse trees? Comparing against other rust parsers?
  • How can it be ensured the grammar doesn't fall too far behind rustc? Is there some way to test between them to quickly highlight divergence? (Maybe even running something in rust-lang/rust's CI to catch it at the PR stage?)
  • Depending on how slow the tests run, can they be segregated so that when experimenting with one part of the grammar (patterns for example), you can focus on subsets?
  • Should fuzzing be employed?

A few relevant links that I had lying around:

@CAD97
Copy link
Contributor

CAD97 commented Oct 12, 2018

The most important thing according to @matklad (and I agree) is that tests should be as close to the definitions as possible. Our "unit" tests should be as close to the definition of the production that they're testing as possible.

(No comment on larger testing harnesses at this time.)

@Centril
Copy link
Contributor

Centril commented Oct 17, 2018

What corpus can be used?

Incrementally we can start with just the basics we know,
then we can move to rustcs test suite as the corpus,
finally and eventually we can use the same set as crater uses.

What types of tests? Pass/fail? Verifying error span/locations? Checking against expected AST/parse trees? Comparing against other rust parsers?

Using spans is probably not needed here.

But we should test beyond mere pass/fail; we need to ensure that the precedence and such things are interpreted right. So to do that we can convert the typed AST we make into syn & libsyntax's ASTs and then compare what syn & libsyntax parse things to. I think using syn is particularly neat here since the parser is quite modular.

How can it be ensured the grammar doesn't fall too far behind rustc?

Eventually, we can at least require patches to the grammar before stabilization of features.

We also need to include features and such so that we track rustc.

Should fuzzing be employed?

If by "fuzzing" you mean property based testing, then I think we should consider that eventually :)
We can generate programs given the canonical grammar and then test against libsyntax and syn.

@matklad
Copy link
Member

matklad commented Oct 19, 2018

So to do that we can convert the typed AST we make into syn & libsyntax's ASTs and then compare what syn & libsyntax parse things to.

Please don't make this the primary testing strategy. It's better to serialize the concrete/abstract syntax to some textual format (JSON, s-expressions, bespoke). That way, you don't depend on a particular AST library quirks, and make it much easier to use the test suite for other parsers.

@Centril
Copy link
Contributor

Centril commented Oct 19, 2018

@matklad oh sure; that also seems easier for rapid prototyping purposes? don't need to recompile so frequently?

@matklad
Copy link
Member

matklad commented Oct 19, 2018

Yep, and you can use difference crate to render the difference for three, and you can implement "show syntax tree of this code" in your editor of choice :)

@Centril
Copy link
Contributor

Centril commented Oct 19, 2018

@matklad of the serialization formats aforementioned, JSON seems like the most cost effective implementation-time wise...?

@matklad
Copy link
Member

matklad commented Oct 19, 2018

@Centril JSON definitely would be the least controversial, though I find a simple hand-rolled one to be preferable, due to better human-readability and diff ability:

struct S {
    foo: u32
}
ROOT@[0; 25)
  STRUCT_DEF@[0; 25)
    STRUCT_KW@[0; 6)
    WHITESPACE@[6; 7)
    NAME@[7; 8)
      IDENT@[7; 8) "S"
    WHITESPACE@[8; 9)
    NAMED_FIELD_DEF_LIST@[9; 25)
      L_CURLY@[9; 10)
      WHITESPACE@[10; 15)
      NAMED_FIELD_DEF@[15; 23)
        NAME@[15; 18)
          IDENT@[15; 18) "foo"
        COLON@[18; 19)
        WHITESPACE@[19; 20)
        PATH_TYPE@[20; 23)
          PATH@[20; 23)
            PATH_SEGMENT@[20; 23)
              NAME_REF@[20; 23)
                IDENT@[20; 23) "u32"
      WHITESPACE@[23; 24)
      R_CURLY@[24; 25)

@Centril
Copy link
Contributor

Centril commented Oct 19, 2018

Interesting; I think for JSON we might have:

{ "Root": {
   "StructDef": {
        "name": { "Ident": "S" },
        "fields": [ {
            "name": { "Ident": "foo" },
            "type": { "PathType": { } }
         } ]
     }
} }

(preserving whitespace and terminals is not something we want to do -- span info can be useful for debugging purposes)

@matklad
Copy link
Member

matklad commented Oct 19, 2018

preserving whitespace and terminals is not something we want to do

Curious, what is the reasoning behind that? Specifying a parse tree seems more natural than specifying the AST: you don't have to decide what shape the AST should have, you just specify what exists in concrete syntax.

It would be cool if the spec could distinguish foo::<> from foo<> from foo, which looks like it does require terminals.

@Centril
Copy link
Contributor

Centril commented Oct 19, 2018

Curious, what is the reasoning behind that?

At least, the canonical grammar as formalized in the GLL / BNF notation shouldn't reason or be explicit about whitespace and such things to make the grammar formalization simpler and readable. Similarly, we shouldn't care about error recovery and reporting.

If on the other hand, given the canonical grammar, you can still extract whitespace and terminal information from a generated parser, then by all means, I have no objections to that. :) But the work product of the WG should be the canonical grammar.

@matklad
Copy link
Member

matklad commented Oct 19, 2018

At least, the canonical grammar as formalized in the GLL / BNF notation shouldn't reason or be explicit about whitespace and such things to make the grammar formalization simpler and readable

Agree here, but let's unpack this: whitespace & comments are a concern of the lexical specification. And it's the job of the lexer to remove them from the resulting token sequence. Agree that it doesn't make much sense to specify whitespace & comments in the test files (I do that in rust-analyzer, just because I care about heuristic-based attachment of whitesapce to nodes), as long as offsets can be recovered.

However, terminals are very much a parser's concern, and should be reflected in the tests imo. From AST's pov, foo::<>, foo<> and foo are all equivalent, but it would be useful, at the speck level, to say that :: is a part of turbo-fish syntax, and not just drop it on the floor as a whitespace.

Similarly, we shouldn't care about error recovery and reporting.

Agree

@Centril
Copy link
Contributor

Centril commented Oct 19, 2018

whitespace & comments are a concern of the lexical specification. And it's the job of the lexer to remove them from the resulting token sequence.

Oh sure;

but it would be useful, at the speck level, to say that :: is a part of turbo-fish syntax, and not just drop it on the floor as a whitespace.

Hm... 🤔 -- it's not part of the abstract syntax of the language at least; i.e. when we want to talk about the type system and such things then it doesn't matter. But you should be able to talk about it as "turbo-fish" syntax in the grammar itself (and write comments in various places...) which is then displayed in a reasonable way in the spec .pdf.

@eddyb
Copy link
Member

eddyb commented Oct 31, 2018

For the record, we're prototyping a Lua cross-checker (between gll-lua and @kyren's luster), and you can see the work in progress on the lua branch of gll.

It takes a parse forest from gll-lua and an AST from luster, and checks that there is at least one possible parse tree that matches the luster AST.

Note that the GLL (shared packed) parse forest is a DAG with maximal sharing and individual full-input parse trees are never instantiated. Their number is pretty much the possible combinations of ambiguous choices, which can get pretty high without extra disambiguation (and we don't have the infrastructure for that yet).

We can probably make this work with a JSON input for the non-GLL side (effectively "deserializing" on the fly), but to turn the GLL state into JSON (for external checking) would be just as hard, and would also require the external checker to understand ambiguity, on top of that.

@Centril
Copy link
Contributor

Centril commented Oct 31, 2018

From today's meeting: we'll start simple with pass/fail and mainly testing against syn/libsyntax to ensure compliance.

@ehuss
Copy link
Contributor Author

ehuss commented Aug 21, 2019

I've been experimenting with a new strategy for the test suite and I wanted to see what people think. The general idea is to create a "conformance test suite". It would consist of:

  • A canonical tree defined in Rust that can be serialized/deserialized.
    • Currently it is almost identical to syn which captures almost everything about the original source, and I think all parsers should be able to translate to (with varying degrees of completeness).
  • A large set of test files with sample code and the corresponding serialized tree output.
    • The general structure would be similar to how tree-sitter's tests are defined, see below.
    • I've been using RON since it is very readable and succinct, though the format isn't too important.
    • Each test can be annotated with flags if it requires special handling (like nightly(specialization) or edition2015).
    • Tests can use fragments of the grammar (though I haven't decided how granular).
  • A test runner library.
    • This will have various options that can be set, such as ignoring certain things if the parser does not support it.
  • Implementations for syn, gll, libsyntax, and rust_analyzer.
    • The gll test runner could check if at least one tree in the forest matches, reporting ambiguities and such (until we get disambiguation).
    • Other parsers could fairly easily import the repo and use the test infrastructure.

A sample of what a test might look like is:

===========
hello world
===========

fn main() { println!("Hello, world!"); }

---

ItemFn(
    sig: FnSignature(
        fn_token: Fn(Span(0, 2)),
        name: Ident("main", Span(3, 7)),
        paren_tokens: (Paren(Span(7, 8)), Paren(Span(8, 9))),
    ),
    block: Block(
        brace_tokens: (Brace(Span(10, 11)), Brace(Span(39, 40))),
        stmts: [
            StmtSemi(
                expr: ExprMacro(
                    mac: Macro(
                        path: Path(
                            segments: [
                                PathSegment(
                                    ident: Ident("println", Span(12, 19)),
                                ),
                            ],
                        ),
                        bang_token: Bang(Span(19, 20)),
                        delimiters: (Paren(Span(20, 21)), Paren(Span(36, 37))),
                        tokens: [
                            LitStr("Hello, world!", Span(21, 36)),
                        ],
                    ),
                ),
                semi_token: Semi(Span(37, 38)),
            ),
        ],
    ),
    span: Span(0, 40),
)

Multiple tests can be placed in a single file, so a lot of the little variants should be easy to scan visually.

WDYT? I think it would be useful to ensure all parsers interpret the language the same, and it would replace the current debug-output tests for the gll grammar. A risk is that it may be difficult to translate between different tree structures (though in my cursory scans, should be possible). Another concern is that it will be another mountain of about 200 structs to define the tree, and then mountains of boring translation code for the different parsers, though I'm willing to put it together.

@eddyb
Copy link
Member

eddyb commented Aug 29, 2019

@ehuss If we transition to the dynamically typed approach, you could probably work with a dynamic representation for the other libraries too, which might reduce effort dramatically.
(Presumably anything that supports serde would work)

Any differences in names/structure could be normalized away with little code, without having to do anything for the parts that are the same everywhere.

@Centril
Copy link
Contributor

Centril commented Sep 13, 2019

@ehuss seems pretty reasonable, possibly with @eddyb's amendment I agree that the RON format is readable also.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants