RFC: subtests, hooks, and parallel tests via `Deno.Tester` API #10771

lucacasonato · 2021-05-26T21:03:47Z

lucacasonato
May 26, 2021
Maintainer

Users have long been asking for native support for subtests / sub steps / groups, {before/after}{Each/All} hooks, and single module parallel test execution. This proposal proposes support for all of the above in an idiomatic and very minimal API, inspired by the Go testing package.

Interface

The API being proposed revolves around a new Deno.Tester class. This class is not constructible by the user. Instead an instance is passed to the user as the first argument to the test function passed to Deno.test. This Deno.Tester class would have a single method called step, with the same signature as Deno.test, with the exception that it returns a Promise<bool> indicating the result of a test. This promise never rejects. Here is what the test related type definitions would look like:

declare namespace Deno {
  export interface TestStepDefinition {
    fn: (t: Tester) => void | Promise<void>;
    name: string;
    ignore?: boolean;
    /** Check that the number of async completed ops after the test is the same
     * as number of dispatched ops. Defaults to true.*/
    sanitizeOps?: boolean;
    /** Ensure the test case does not "leak" resources - ie. the resource table
     * after the test has exactly the same contents as before the test. Defaults
     * to true. */
    sanitizeResources?: boolean;
    /** Ensure the test case does not prematurely cause the process to exit,
     * for example via a call to `Deno.exit`. Defaults to true. */
    sanitizeExit?: boolean;
  }

  export interface TestDefinition extends TestStepDefinition {
    /** If at least one test has `only` set to true, only run tests that have
     * `only` set to true and fail the test suite. */
    only?: boolean;

    /** Optional metadata about the sub steps that will be registered.
     * Specifying this aids the test runner in filtering. This property is
     * entirely optional, and is usually only populated by testing frameworks
     * built ontop of `Deno.test`.
     *
     * An example of how this API is to be used:
     *
     * ```ts     *
     * Deno.test({
     *   name: "group",
     *   async fn(t) {
     *     await t.step("step 1", () => {
     *       await t.step("sub step 1", () => {});
     *     });
     *     await t.step("step 2", () => {});
     *   },
     *   stepMetadata: [
     *     { name: "step 1", steps: [{ name: "sub step 1" }] },
     *     { name: "step 2" },
     *   ],
     * });
     * ```
     * */
    stepMetadata?: TestStepMetadata[];
  }

  export interface TestStepMetadata {
    name: string;
    steps?: TestStepMetadata[];
  }

  /** Register a test which will be run when `deno test` is used on the command
   * line and the containing module looks like a test module.
   * `fn` can be async if required.
   */
  export function test(t: TestDefinition): void;

  /** Register a test which will be run when `deno test` is used on the command
   * line and the containing module looks like a test module.
   * `fn` can be async if required.
   */
  export function test(
    name: string,
    fn: (t: Tester) => void | Promise<void>,
  ): void;

  export class Tester {
    /** Run a sub step of the parent test with a given name. Returns a promise
     * that resolves to a boolean signifying if the step completed successfully.
     * The returned promise never rejects. If the test was ignored, the promise
     * returns `false`.
     */
    step(t: TestStepDefinition): Promise<bool>;

    /** Run a sub step of the parent test with a given name. Returns a promise
     * that resolves to a boolean signifying if the step completed successfully.
     * The returned promise never rejects. If the test was ignored, the promise
     * returns `false`.
     */
    step(name: string, fn: (t: Tester) => void | Promise<void>): Promise<bool>;
  }
}

Behaviours

Basic

Tests that are currently in the wild would continue to work as they do now:

/** Test with no subtest(s), no initalizers or destructors. */
Deno.test("abs", () => {
  const got = Math.abs(-1);
  assertEquals(1, got);
});

The first parameter of the test function is now a Deno.Tester. This can be used to run sub steps:

/** Test with substeps (sequential), but no initalizers or destructors. */
const cases = [
  [1, 1, 2],
  [1, 2, 3],
  [-1, 2, 1],
];
Deno.test("addition (sequential)", async (t) => {
  for (const [a, b, expected] of cases) {
    await t.step(`${a}+${b}=${expected}`, () => {
      const actual = a + b;
      assertEquals(actual, expected);
    });
  }
});

Substeps can also run in parallel, but the resource sanitizers need to be disabled (more on that below):

/** Test with substeps (parallel), but no initializers or destructors. */
Deno.test("addition (parallel)", async (t) => {
  await Promise.all(
    cases.map(([a, b, expected]) =>
      t.step({
        name: `${a}+${b}=${expected}`,
        async fn() {
          const actual = a + b;
          await delay(100);
          assertEquals(actual, expected);
        },
        sanitizeResources: false,
        sanitizeOps: false,
        sanitizeExit: false,
      })
    ),
  );
});

Tests can have subgroups, with substep too:

/** Test with subgroups and sub steps */
Deno.test("group with subgroup", async (t) => {
  await t.step("subgroup", async (t) => {
    await t.step("case 1", () => {});
    await t.step("case 2", () => {});
  });
});

You can now also run setup and teardown steps before tests:

async function setupDatabase() {
  await delay(100); // setup database here
  return {
    destroy() {
      // destroy database here
    },
  };
}

/** Test with before and after hooks */
Deno.test("database tests", async (t) => {
  const database = await setupDatabase();

  await t.step("first step using database", () => {});
  await t.step("second step using database", () => {});

  database.destroy();
});

Sanitizers

Deno.test has three sanitizers as of now. The exit sanitizer, the resource sanitizer, and the op sanitizer. These work in slightly different ways, but all work with the assumption that only one test executes at once. At the current time this is a hard limitation that is not circumventable. When wanting to use any of these sanitizers, you can not run tests in parallel (more on that later).

In the new model sanitizers should be hierarchical. A test sanitizes its own function body, and all of its subtests. Each subtest gets its own sanitizer scope too. Sanitizer configuration should be inherited from the parent by default, but can be overwritten on the options. A little graph for the "database tests" example above:

> "test 1" = sanitizer "test 1" start
<          = sanitizer end

>  "database tests"                                                     <
   > "first case using database" <   > "second case using database" <

Back to parallel tests: because of the parallel sanitizer limitation we have to make sure users can not run two sibling tests at once if any of them has the resource / op / exit sanitizer enabled. This means that when starting a test, if there are any sibling tests or subtests of sibling tests ongoing that either have the sanitizers enabled, or the current test has the sanitizer enabled, the test immediately fails.

Permissions

Just like with sanitizers, the same parallelization limitations apply. Only a single test can have permission sandboxing occur at a time. Two tests with different permissions can not run in parallel. This means that when starting a test, if there are any sibling tests or subtests of sibling tests ongoing that either have different permissions to the permissions requested by this test, the test immediately fails.

Just like sanitizers, permission are inherited by default but can be overwritten.

No implicit subtest await

Tests do not implicitly await for the completion of their subtests. Some users might expect subtests to be implicitly awaited before test completion, but this would make it to easy to accidentally run tests in parallel, resulting in issues with the sanitizer parallelization. Subtests not being completed at the end of the parent test means that a subtest "leaked". This should result in immediate error. All subtest have to be completed by the end of the parent test function.

Reporter output

Just like in Go, the tests would be output in hierarchical fashion with tab prefixes:

test abs ... ok
test database tests ...
    test first case using database ... ok
    test second case using database ... ok
ok
test group ...
    test subgroup 1 ...
        test subtest 1 ... ok
        test subtest 2 ... ok
    ok
    test subgroup 2 ...
        test subtest 1 ... ok
        test subtest 2 ... ok
    ok
ok

Filtering

Earlier iterations of this proposal did not provide a way to filter based on sub steps. This has now been resolved.

The main downside of imperative substeps is that they can not undergo filtering without running setup and teardown steps first. This is a challenge when implementing frameworks compatible with Jest or Mocha ontop of Deno.

This is mitigated by the addition of a stepMetadata property on the options bag provided to Deno.test. This optional property allows a framework (or user) to give the Deno test runner a heads up about all the steps that it will imperatively invoke.

This gives the test runner enough information to provide filtering, even without having to run test blocks and their setup and teardown steps.

This stepMetadata is rather verbose and is not really meant to be used by users directly. It is meant to be used by frameworks which already have this information available. The stepMetadata does not need to be specified. If it isn't, substep filtering just doesn't work (this is no big deal).

How it works

Let's say a user has this script, and runs deno test --filter "sub step 1":

Deno.test({
  name: "group 1",
  async fn(t) {
    await t.step("step 1", () => {
      await t.step("sub step 1", () => {});
    });
    await t.step("step 2", () => {});
  },
  stepMetadata: [
    { name: "step 1", steps: [{ name: "sub step 1" }] },
    { name: "step 2" },
  ],
});

Deno.test("top level test", () => {});

First, tests are registered with the Deno test runner through Deno.test. Internally the test runner now has a list of names for all registered tests / steps.

The test runner now filters the list of registered tests / steps based on the --filter argument. It finds "sub step 1", which is a child of "step 1", and that is a child of "group 1". This means that to run "sub step 1", the test runner will need to run "group 1", then "step 1", only then "sub step 1".

The test runner will not invoke "top level test", or "step 2" (the promise that the latter returns resolves to false).

A building block

This API would significantly improve the usefulness deno test, and would allow for users to build frameworks exposing functions like describe, it, or beforeAll/afterAll without worrying about sanitizer issues, or messing with test output on the terminal.

crookse · 2021-05-26T21:52:51Z

crookse
May 26, 2021

@lucacasonato, I like this a lot! Essentially, this would kill off Rhum and most of its functionality, but that's ok. Drash Land feels that there shouldn't be a need for a third party testing framework like Rhum just to get nesting and a different result output. We feel those features should be native to Deno. So to answer your question on how this would impact Rhum, Rhum would probably just end up being a mocking module or something -- or just completely dead and archived.

0 replies

ebebbington · 2021-05-26T22:05:25Z

ebebbington
May 26, 2021

@lucacasonato pretty much what @crookse said! He hit the nail on the head.

We created Rhum because we felt test files and test output could be ‘improved’ (in quotes because somewhat subjective to personal preference), so with your proposal, it will kill off Rhum because essentially, what Rhum offers (or at least it’s main advantages), would be part of Deno. Meaning there wouldn’t be a need for most of Rhum’s functionality and ‘Rhum test files’ can go native instead (which isn’t a bad thing).

For what it’s worth, I really like your proposal!

0 replies

KyleJune · 2021-05-26T22:35:03Z

KyleJune
May 26, 2021

How would test filtering work?
How would only flag work?

I think other frameworks like jasmine have focusing a test group apply to all its children unless they are set to be ignored. That's what I did in my test_suite module, which uses Deno.test internally. The flags for a group become the default flags for all tests within that suite. At the top level, it just uses the Deno.test defaults.

Are there better proposals with same flexibility / functionality?

I liked the idea of a TestSuite from #4092 (comment) and created a module based on that idea. In my module I made it so that a suite can be passed as the first argument to the test function or it can be included in the test definition. Then for creating subgroups you would just include the parent suite in the subgroups test suite definition. It supports sanitizeResources and sanitizeOps. I haven't added sanitizeExit but that would be easy to add. The readme has an example of grouping.

https://github.com/udibo/test_suite/tree/v0.7.0#testsuite

The thing I liked about the TestSuite idea was that it allows grouping without nesting. When you use the TestDefinition call signature, nesting could result in tests having a lot of indentation in front of them. Below I modified your example to use TestDefinition call signature. You can see just 3 layers in the function's body ends up having 12 spaces in front of it.

/** Test with subgroups and sub tests */
Deno.test({
  name:"group with subgroup",
  async fn(t) {
    await t.run({
      name:"subgroup",
      async fn(t) {
        await t.run({
          name:"case 1",
          fn() {
            // assertions here
          },
        });
        await t.run({
          name:"case 1",
          fn() {
            // assertions here
          },
        });
      },
    });
  },
});

Another advantage of a flat TestSuite style is that if you want to add groupings later, you could do so without having to modify every line for new indentation level. You would just create the new sub testsuite then update the tests that belong to it to have it as a suite argument.

I believe most people working with javascript and typescript are familiar with describe/it syntax. A flat TestSuite style of test grouping can support userland implementing that type of grouping. I added optional describe/it functions to my module that use TestSuite internally.

I believe your proposal or TestSuite from the previous deno discussion I linked would make it easier for third parties to create test wrappers in the style they prefer. It would be easy to create a describe/it wrapper or a flat structured wrapper like TestSuite.

The TestSuite module I created would be unaffected by this change and it could easily be modified to use Tester internally for grouping.

0 replies

Guergeiro · 2021-05-27T09:13:42Z

Guergeiro
May 27, 2021

Before anything, thank you @lucacasonato to reaching out to the @drashland team for an opinion. We already discussed the current Deno.test a lot internally and sometimes how painful it is working around it 😜

We REALLY wanted hooks + better output to be easier to deal with. We arrived at two possible outcomes for it:

Have Deno support it natively and be done with it. That would be the easy path for us (even if it meant deprecating Rhum) and it would keep it under Deno runtime. This would bring consistency across all testing in Deno, but would probably mean the ecosystem around testing would be killed (maybe too much negativity here).
When it came to output, instead of Deno handling the Test Result, that would be some kind of return value for Deno.test and the developer would be the one handling outputting whatever they wanted to output based on the result. This would probably be the Royal Rumble of testing ecosystem, with each framework doing what fits their needs.

To conclude, I think this proposal really fitted what we (@drashland) wanted most. The unification of testing.

A personal note is that is nice that me, as a developer, don't need to worry about which testing framework is the best for Deno as I need to do in Node. It just comes out of the box. As a framework maintainer point of view, it's bad because it kills a project in which time was invested. But for me, positives outweigh the negatives (which are pretty much only pride 😄).

0 replies

caspervonb · 2021-05-27T09:45:48Z

caspervonb
May 27, 2021

Nice, excellent elaborate write up @lucacasonato!

I'm not explicitly against this yet but it brings up the question of procedural vs declarative.

So far our approach has been mostly declarative, in that test definitions have been specified as objects so mixing procedural and declarative in this way might not be that great.

The Go test runner is declarative all the way through, to ignore you call the Skip method on the test context, but we have it as an option so already we've diverged to the point where there is very little purpose of a context like this.

In terms of handling our edge cases, to some degree this seems to introduce more than it solves.
The advantage of being declarative is that we "the test runner" can easily reason about things ahead of time before tests start running any test cases.

For example, we could easily figure out things like how to partition sets of tests. Like a set of tests marked as parallel safe with the same permissions can be partitioned and run in parallel based on their permission requirements.

Same applies to sanitisers altho with sanitiser failure, however it will be harder for you the user to pin-point which test is leaking resources if N tests are run in parallel; so to debug you'd want to disable test case parallelism. This isn't a problem for the exit sanitiser, only applies to ops and resources.

Potentially, we could be a lot smarter about tracking that but since this is global state so it's non trivial problem, being able to temporarily disable parallelism with a flag for debugging seems like a good compromise here.

Filtering is also difficult with a procedural model like this, because we don't know what subtests there are until in the middle of test execution.

Unlike Go we have top level execution with await, so we don't need functions like Skip to ignore tests.

We also don't need to execute code inside a function to generate a table test, we can just execute that in the top level scope, which is going to be prettier with time as we get do let blocks.

declare namespace Deno {
  export interface TestDefinition {
    fn: () => void | Promise<void>;
    name: string;
    ignore?: boolean;
    parallel?: boolean,
  }

  export interface TestCollection {
    entries: Array<TestDefinition | TestCollection>;
    name: string;
  }

  export function test(test: TestDefinition | TestCollection);
}

The basic version of Deno.test would look something like this:

Deno.test({
  name: "Array.prototype.push",
  entries: [
    {
      name: "it allows one element to be given",
      fn() {
        // ...
      },
    },
    {
      name: "it allows two elements to be given",
      fn() {
        // ...
      },
    },
  ],
]);

Outputs:

Array.prototype.push
  it allows one element to be given ... ok
  it allows two element to be given ... ok

Also, since we swallow the console it does probably makes sense to have some sort of context available.
But I see its main use being more in line with printing diagnostics, log messages and extra things to the test harness without getting involved with the scheduling of test cases.

Haven't really thought this through yet, but maybe a console like interface.

interface TestContext {
  assert();
  log();
  debug();
  // ...
}

This is loosely based on TAP, which prints lines like these as "comments".

3 replies

lucacasonato May 27, 2021
Maintainer Author

I think this misses one key point of my proposal though, namely a way to do beforeAll and afterAll hooks on groups.

I think grouping itself is only useful if this feature exists. It can not be implemented in user land without reimplementing the sanitizers in user land.

dsherret Jun 18, 2021
Maintainer

I agree with @caspervonb about how it would be better to be all declarative (test builder phase, possible filtering, then execution phase). It would solve the open question of how "only" and filtering would work... I don't see how they could with the proposed approach. There should be a way to do hooks in a declarative way.

Edit: I guess the framework would do the filtering? Also, another downside to a non-fully declarative approach is there's no way to tell how many tests are remaining for the deno test runner. I guess the frameworks could manage that too and handle displaying the output.

bartlomieju Jun 18, 2021
Maintainer

I agree with both @caspervonb and @dsherret, IMO we shouldn't change our declarative approach to procedural (or even worse mixed one). After/before hook can be done with declarative approach as well. Maybe we should consider a "builder pattern" API instead of an option bag (it seems it's quite popular in other JS testing frameworks).

getspooky · 2021-05-31T20:38:20Z

getspooky
May 31, 2021

Nice, excellent elaborate write up @lucacasonato!
For me describe block is not necessary. Everything in the file must be clearly tested and including even a single level of nesting is pointless. This significantly improves the ability for us to understand what's going on in each test without having to do any scrolling around. If we had a few dozen more tests, the benefits would be even more potent.

0 replies

ebebbington · 2021-06-18T13:33:15Z

ebebbington
Jun 18, 2021

Hey @lucacasonato, I was just curious if the test part(s) in the output were needed? I wonder if it adds uneccessary bloat? Would be interested to hear your thoughts on this (and others too if people want to chime in, after all, this is a discussion :) )

1 reply

ebebbington Jun 18, 2021

As clarified by @caspervonb, he has plans to remove this prefix (though i believe this isnt concrete). It's aminor detail in the reporter.

As such, my question is void, unless people still wish to chime into this thread.

There is an issue related to my original comment: #7840

yacinehmito · 2021-11-13T15:40:21Z

yacinehmito
Nov 13, 2021

How are the use cases for beforeEach and afterEach meant to be handled with or following that proposal?

0 replies

zandaqo · 2021-11-13T16:18:08Z

zandaqo
Nov 13, 2021

@yacinehmito AFAIK, there are no special provisions for the beforeEach/afterEach hooks. If you are writing in BDD style, you can use (or take inspiration from) this gist where describe sections are treated as "groups", with it being test steps, and hooks being run prior/post each step.

1 reply

yacinehmito Nov 20, 2021

Thank you @zandaqo.

I think this reply should be indented below the original comment: #10771 (comment)

yacinehmito · 2021-11-20T12:43:38Z

yacinehmito
Nov 20, 2021

I'd like to address the following point on stepsMetadata:

Optional metadata about the sub steps that will be registered. Specifying this aids the test runner in filtering. This property is entirely optional, and is usually only populated by testing frameworks built ontop of Deno.test.

This feels like a departure in style with a core value-proposition of Deno. This is personal, but my drive to depend on Deno as opposed to other runtimes is the well-integrated set of tools that removes the need from setting things up (TS support, dependency management, linting, formatting, testing). I don't think these tools ought to be full-fledged, but they should have just enough features to be used successfully by most projects. On that basis, I'd like to use the raw deno test and Deno.test, and only bring up a framework on top of it if I have specific testing needs (like behaviour testing, or mocking a browser context).

The current design effectively hampers the filtering feature. As stated in the comment, it is not reasonable for a someone using Deno.test directly to fill stepsMetadata. However it is also surprising to not be able to filter when relying on nested tests. It is only by reading the RFC that I found this limitation, which wouldn't be a great experience for users if the interface were to become stable. I believe filtering on nested tests to be among the core features that would be expected for most projects.

Suggestion to illustrate my point

As such, I suggest the introduction of t.beforeAll and t.afterAll. The former in particular would be able to populate a context, fed to the following steps.

Here is an example from the RFC, slightly modified:

Deno.test("database tests", async (t) => {
  const database = await setupDatabase();

  await t.step("first step using database", () => {
    console.log(database);
  });
  await t.step("second step using database", () => {});

  await database.destroy();
});

Here is what it would look like with t.beforeAll and t.afterAll.

Deno.test("database tests", async (t) => {
  t.beforeAll(async () => {
      return {
        database: await setupDatabase()
      };
  });

  await t.step("first step using database", (t, { database }) => {
    console.log(database)
  });
  await t.step("second step using database", () => {});

  t.afterAll(async ({ database }) => {
    await database.destroy();
  });
});

Executing the test suite without executing the callbacks would allow building an object with all the steps, which would successfully allow filtering.

This is just a suggestion, so as to be have a productive approach; I am not inviting comments on this particular suggestion, but rather on whether filtering ought to work out of the box for nested tests in Deno.

Are we open to revising the design in order to support this feature out of the box?

0 replies

lucasfcosta · 2022-01-01T22:11:39Z

lucasfcosta
Jan 1, 2022

Hi all, this discussion has been greatly enlightening, and I have some input.

To make it more understandable, I have divided my thoughts into four sections:

Declarative vs. Procedural API
I share some of @caspervonb's concerns around the procedural nature of the steps API and how concepts get mixed when using test and steps.
Implementing hooks
How do we implement hooks considering the "declarative vs procedural" conundrum?
Should we even implement hooks, or should we delegate that to third party library implementers?
Naming
How should we name the methods in this API considering the two points above?
Does it make sense to have a step API?

Items 2 and 3 depend on item 1.

1. A declarative vs. a procedural API

Currently, our approach to tests requires users to declare them twice. Once to implement the tests themselves and then to explain their tests' structure, which is already implicit in the way tests are nested.

This duplication requires a lot of effort from users and requires us to write more code on our end.

By following a declarative approach, we can separate the act of declaring tests from the act of running them.

My proposed API is similar to what @yacinehmito has suggested. The only difference is that it does not require users to await on a step itself. Instead, users will declare async functions, upon which the runner will await.

Users don't need to await on a particular step because when they "declare" a test, they are not telling it to run. They're simply saying that a test exists and has an asynchronous function associated with it.

// Notice how the Deno.test callback doesn't need to return a promise.
// It doesn't need to return a promise because it _schedules_ tests (in other words, it _declares_ them)
Deno.test("database tests", (t) => {
  // Notice how in the previous example you naturally
  // used an `async` callback, not an `await` on `beforeAll`
  t.beforeAll(async () => {
      return {
        database: await setupDatabase()
      };
  });

  t.step("first step using database", async (t, { database }) => {
    await myAsyncTask();
  });
  
  t.step("second step using database", () => { /* ... */ });

  t.afterAll(async ({ database }) => {
    await database.destroy();
  });
});

Such separation:

Makes it trivial to implement hooks
Allows us to have more modular code
Facilitates the filtering and sanitizing of tests and their efficient scheduling
Opens up the possibility of third-party library implementers to hook into our declarative API to write their own runners or simply listen to our runner's events

2. Implementing hooks

2.1 How do we implement hooks?

With such an API, the beforeAll, beforeEach, afterAll, afterEach implementation becomes trivial. They schedule functions, and our runner knows when to execute those functions and which ones to filter out.

I feel like we should have these explicit hooks, as they cover all the possible points when a user may want to execute setup/teardown operations. Furthermore, these functions are already widely used in the JavaScript testing community, so everyone is familiar with them. Given its adoption by people and all other testing frameworks, I believe such an interface has been an enormous success.

2.2 Should Deno even implement hooks? If so, how?

In my view, it should.

If Deno wants to be "a productive and secure scripting environment for the modern programmer", it must come with everything users will need to test their code.

Now, everyone has a different view on what's essential. Still, I think the vast majority of people would agree that the set of crucial features includes an assert function and testing hooks for setting up and tearing down infrastructure for both individual tests and groups of tests.

I think the most important aspect up for debate is not whether Deno should implement hooks, but whether it wants to have the final word on how tests should be written.

A good balance between incentivizing standardization of good practices and having an open ecosystem would be to follow an approach similar to what Jest has done with jest-circus. Such an approach consists of implementing their own declarative API but emitting events which allow users to do whatever they want to run hooks, or tests themselves (and actually a bit more). This step takes us even further in the direction of separating the declaration of tests to their execution.

These are the advantages of such an approach:

We're still opinionated about how users declare tests, so there's standardization across Deno's ecosystem.
Furthermore, because events for each hook are still emitted in a particular order, we're also opinionated on which hooks users can implement and the semantics of each.
For example: beforeAll is always emitted before all tests within a group, even though a third party may want to do something different when it's emitted.
Third-party implementers can do whatever they want in terms of executing tests (and hooks).
Suppose someone wants to schedule each group of tests to run in a particular container. In that case, they can implement a runner that listens to the test events emitted and runs each within a particular container.
Furthermore, if third-party approaches to testing are successful, we can always bring them into the core.

3. Naming

Even though I like the step interface, I think its semantics should be different. Furthermore, I think we should make our APIs familiar with what people are already used to, especially considering the success of the most widely used APIs in the JS testing ecosystem.

That doesn't mean we necessarily need to name our APIs describe and it. I think most people actually misuse them, as these APIs are not supposed to simply mean you're "grouping" tests. These APIs exist to incentivize people to write tests that read like "given scenario X (describe) then my code (it) does Y". The problem is that most people just use describe to name a suite and it to name a test.

For example:

Expected usage (ideal):

describe("when buying alcoholic drinks" , () => {
  describe("when buyers don't look like they're 25", () => {
    it("shows a confirmation popup asking for ID", () => { /* ... */ });
  });
  
  describe("when buyers look like they're over 25", () => {
    it("immediately charges the customer", () => { /* ... */ });
  });

  it("sends telemetry data to the central management system", () => { /* ... */ });
});

Actual usage:

describe("Sales Module" , () => {
  it("Confirmation popup for under 25", () => { /* ... */ });
  it("Immediate checkout for over 25", () => { /* ... */ });
  it("Telemetry data calls", () => { /* ... */ });
});

Considering the above, I don't think we need to name our functions describe nor it. Instead, we could name them suite and test considering the actual usage patterns illustrated above.

My only strong opinion is that I do think that APIs for recursive grouping (meaning you can nest as many "describes" as you want) and individual tests must exist. Hooks should exist too (as per the explanation above). For hooks, given how commonplace and successful they are, I'd also advise we keep the same names (as they also evoke similar semantics in people's minds).

3.1 What about steps? + A concrete usage example

I think that steps have a place in tests, but I think they should be used for a different purpose. In my view, steps represent an atomic part of a test. Steps cannot have beforeEach or afterEach hooks associated with them. They exist just so that users can have a more detailed output for critical parts of a test and so that runner implementers (as described in section 2) could perform a particular operation for each step of a test.

Elastic's synthetics runner is a concrete example of how useful it is to have a step API (and another argument for having a jest-circus approach to our testing infrastructure).

Disclaimer: I do work for Elastic.

The synthetics runner (which is nothing but a test runner, really), uses the step API to indicate when it should take a screenshot of a page so that users can more easily debug their synthetics tests.

Now, if you imagine that step preserves the semantics I described above and that we have a jest-circus-like system, anyone could hook into Deno to take screenshots for each of theirs test's steps (as in the synthetics example) or record a particular state of a database, or HTML page for each step (or even send telemetry data to their servers).

6 replies

lucasfcosta Jan 2, 2022

What I mean by “declare” is “express”. Sorry, I could’ve used better words to explain.

My point is that if you look at t.step and then the metadata passed, the metadata expresses the very structure already expressed in the nesting. There’s semantic duplication: the semantics of what’s already declared (the test structure) is expressed again.

In other words, I think the current API reads like:

Here are my test functions and how I want them to “be run” (nesting)
Let me explain how I declared them (adding metadata just because we can’t deduce it from the previous step)

By following a declarative approach, you can deduce structure from the declaration. Furthermore, you can separate declaration from execution.

Hope that makes sense, it’s a small change in the way we think about it, but that yields huge benefits.

Thanks for taking the time to read the proposal. There’s quite a lot in it, so please feel free to contact me if you want to discuss synchronously or need any further clarification.

lucacasonato Jan 2, 2022
Maintainer Author

But metadata is completely optional, and is only used to improve filtering in situations where you have the data available. Metadata should never be hand written.

lucasfcosta Jan 2, 2022

Definitely, I see that metadata should never be hand written. However, in situations where you will filter, you will write it. With a declarative approach, it could be avoided altogether. Furthermore, there would be other benefits to it (described above in more detail).

Btw if you think that metadata is useful for other situations or if there are any further concerns or disagreements with the approach and advantages above, please let me know. Overall, I found the declarative API to be more advantageous, but I may be missing other advantages you have in mind with the current approach.

lucacasonato Jan 2, 2022
Maintainer Author

My issue with a declarative first approach is twofold:

a) declarative tests are arguably a lot harder to read and comprehend.

This is because hooks don't follow the natural "top to bottom" semantics of "regular" code. Instead you need to create temporary variables in higher scopes, then assign to them in a callback in a different scope, then a test is called in yet another scope, and finally a finalizer is called to clean up (in yet another scope.

Regular JS knowledge does not carry over into a world of test hooks. You can't use try / catch / finally to handle errors and clean up. Additionally hooks based testing does not work well at all with explicit resource management for example.

There are also issues with TS typings for hook based tests, as it can make less inferences because of unknown ordering of calls.

b) the imperative test API is lower level than the declarative API

You can emulate a declarative API on top of the low level imperative API very easily. For example you can write a mocha polyfill on top of the imperative API. It is also possible to polyfill other imperative testing APIs on top of our imperative API (for example node-tape or tap). It is not however possible to write an imperative testing framework on top of a declarative API. This is why the imperative API is the better low level primitive.

However, in situations where you will filter, you will write it.

You should never write the metadata manually. The metadata field exists purely to give declarative testing frameworks that are written on top of Deno.test a way to give the test runner more data for filtering if they have this data available. For example a mocha polyfill would have a full view of all test data upfront, which would allow it to generate this metadata automatically with no extra effort or input from the user. The key thing is that the metadata does not affect how tests are run. It is completely optional and is only meant for display and/or filtering metadata.

lucasfcosta Jan 6, 2022

@lucacasonato that makes total sense, thanks a lot for the detailed explanation. My apologies for having you go through such a lengthy discussion. I do feel like these examples, especially the mocha polyfill you posted has helped clarify the preference for an imperative API.

Thanks for also taking the time to explain the metadata question, I did misunderstand that initially.

The proposal looks really well thought through, brilliant work.

zandaqo · 2022-01-14T12:22:14Z

zandaqo
Jan 14, 2022

For example you can write a mocha polyfill on top of the imperative API.

@lucacasonato maybe that "polyfill" would be a good addition to deno_std/testing? The Jest/Mocha BDD style is popular in JS world, and I doubt most of the users would care all that much about the implementation as long as they get familiar looking test files, so this would be a low overhead option as compared to porting/using mocha in compat mode.

7 replies

KyleJune Jan 14, 2022

Someone created an issue regarding the testing code lens being able to work with other functions earlier this week. I'm not sure if it will happen but in test_suite I can focus tests by changing describe to fdescribe or it to fit like the jasmine test runner has (https://jasmine.github.io/api/4.0/global.html#fit). So not having the the testing code lens isn't too big of a deal for me.

denoland/vscode_deno#606

zandaqo Jan 17, 2022

@KyleJune Indeed, nicely done! My point is that perhaps the popularity of this style merits adding some barebone version of it to STD. Especially since mimicking describe/it with the current test API is rather straightforward. Third-party frameworks can in turn extend on top of it just like with the assertions.

In in ideal world, I'd like to use fully featured third-party testing frameworks like test_suite with my application code that I work on daily and don't mind updating often. Whereas for my libraries I'd prefer to limit dependencies simply to avoid updating them unduly: Jest had been a constant source of headache in this scenario.

KyleJune Jan 17, 2022

I agree it would be nice to have included but I think generally the Deno team doesn't want there to be multiple ways of doing the same thing built into Deno. I believe there was a thread in the past regarding different ways of writing tests and the conclusion was to leave that to third party libraries.

You wouldn't need to update your test deps everytime a new minor release comes out as long as upgrading deno doesn't break it or your current version isn't missing new features you'd like to use.

The test step API will simplify grouping. Currently my library copies how Deno.test was working internally for assertions about resources and ops. Once it is using the test steps API, the library will become a lot thinner. Since Deno.test is a part of the runtime, if there are any bugs/fixes to it, you should automatically get them without having to update the third party module.

KyleJune Mar 27, 2022

I went ahead and made a PR for adding test_suite to std/testing as bdd.ts.

denoland/std#2067

KyleJune Apr 11, 2022

It's been merged, it will be available in std@0.135.0

RFC: subtests, hooks, and parallel tests via Deno.Tester API #10771

lucacasonato May 26, 2021 Maintainer

Interface

Behaviours

Basic

Sanitizers

Permissions

No implicit subtest await

Reporter output

Filtering

How it works

A building block

Replies: 12 comments · 18 replies

lucacasonato May 27, 2021 Maintainer Author

dsherret Jun 18, 2021 Maintainer

bartlomieju Jun 18, 2021 Maintainer

1. A declarative vs. a procedural API

2. Implementing hooks

2.1 How do we implement hooks?

2.2 Should Deno even implement hooks? If so, how?

3. Naming

3.1 What about steps? + A concrete usage example

lucacasonato Jan 2, 2022 Maintainer Author

lucacasonato Jan 2, 2022 Maintainer Author

RFC: subtests, hooks, and parallel tests via `Deno.Tester` API #10771

lucacasonato
May 26, 2021
Maintainer

Replies: 12 comments 18 replies

lucacasonato May 27, 2021
Maintainer Author

dsherret Jun 18, 2021
Maintainer

bartlomieju Jun 18, 2021
Maintainer

lucacasonato Jan 2, 2022
Maintainer Author

lucacasonato Jan 2, 2022
Maintainer Author