Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make cargo test usable for nearcore developers #4490

Closed
matklad opened this issue Jul 9, 2021 · 15 comments
Closed

Make cargo test usable for nearcore developers #4490

matklad opened this issue Jul 9, 2021 · 15 comments
Assignees
Labels
A-testing Area: Unit testing / integration testing C-housekeeping Category: Refactoring, cleanups, code quality Node Node team P-low Priority: low T-node Team: issues relevant to the node experience team

Comments

@matklad
Copy link
Contributor

matklad commented Jul 9, 2021

I've noticed that I never run cargo test --workspace locally. I do want to have an ability to quickly check if I broke something somewhere, but the usability of our current cargo test is abysmal.

Specifically:

  • Tests output a lot to the terminal, making terminal scrollback buffer useless.
  • Test suite is prohibitively slow. It takes 15 wall-clock minutes / 12 cpu minutes on a 24-core machine.

I suggest immediate AIs:

  • fixing the tests such that the output of cargo test is never polluted unless the test actually fails. I think using TestWriter is one possible way to do this
  • moving all obviously slow tests to expensive_tests
  • revisiting this issue after this easy wins are achieved.

Longer term, I think it might sense to do the following:

  • replace expensive_tests with run-time check (see the example at the end of "Make Tests Fast" section of this post).
  • add two tiers of expensive_tests -- nightly_only_tests (> 5 min) and ci_only_tests (>10s, <5min). Local cargo t skips ci_only_tests by default, but they are run by CI for every commit that gets into master
  • document the expectations about tests run-time and actively fix regressions.
  • aim for really short test turn-around time. To give an aspiration goal, in rust-analyzer cargo t takes 5s wall-clock/35s cpu by default and 30s/100s when including ci only tests.
@matklad matklad added C-housekeeping Category: Refactoring, cleanups, code quality A-testing Area: Unit testing / integration testing T-node Team: issues relevant to the node experience team labels Jul 9, 2021
@janewang janewang added the P-low Priority: low label Aug 11, 2021
@stale
Copy link

stale bot commented Nov 9, 2021

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months.
It will be closed in 7 days if no further activity occurs.
Thank you for your contributions.

@stale stale bot added the S-stale label Nov 9, 2021
@bowenwang1996
Copy link
Collaborator

@matklad this is fixed right?

@stale stale bot removed the S-stale label Nov 9, 2021
@matklad
Copy link
Contributor Author

matklad commented Nov 10, 2021

Not at all:

  • running cargo test still outputs megabytes of DEBUG-level logging to the terminal,
  • it still takes 15 minutes to complete on a 12-core machine

@pmnoxx
Copy link
Contributor

pmnoxx commented Jan 3, 2022

@matklad The solution for having many prints is to run the code with
RUST_LOG=warn cargo test --workspace.

By setting proper environment flags you eliminate the issue.

@matklad
Copy link
Contributor Author

matklad commented Jan 5, 2022

Seems reasonable to flip the default? Run with RUST_LOG=INFO if you need the logs, but get no unnecessary verbose output by default. Generally you only want to see logs for failing tests.

@pmnoxx
Copy link
Contributor

pmnoxx commented Jan 5, 2022

Seems reasonable to flip the default? Run with RUST_LOG=INFO if you need the logs, but get no unnecessary verbose output by default. Generally you only want to see logs for failing tests.

I agree, it's a good idea to switch the default for tests. However, then we should update the CI to run with RUST_LOG=INFO. Otherwise, you won't see errors on test failure.

Also changing the default, would change default settings for neard binary.

@mina86
Copy link
Contributor

mina86 commented Jan 5, 2022

Generally you only want to see logs for failing tests.

Yes, this is the issue. When I run tests all I want to see is test foo::bar::baz ... ok a bunch of times. Logs should only be printed on failure. And to be honest I don’t understand why this isn’t happening; isn’t test runner supposed to capture output of each test? Why are logs still visible?

@mina86
Copy link
Contributor

mina86 commented Jan 7, 2022

Turns out libtest is broken: rust-lang/rust#90785

@matklad
Copy link
Contributor Author

matklad commented Jan 7, 2022

diff --git a/test-utils/logger/src/lib.rs b/test-utils/logger/src/lib.rs
index 7372b5308..6be107180 100644
--- a/test-utils/logger/src/lib.rs
+++ b/test-utils/logger/src/lib.rs
@@ -20,7 +20,7 @@ fn setup_subscriber_from_filter(mut env_filter: EnvFilter) {
     let _ = tracing_subscriber::fmt::Subscriber::builder()
         .with_span_events(tracing_subscriber::fmt::format::FmtSpan::CLOSE)
         .with_env_filter(env_filter)
-        .with_writer(std::io::stderr)
+        .with_writer(tracing_subscriber::fmt::TestWriter::new())
         .try_init();
 }
 

fixes the issue (a bit swamped rn to submit an actual PR)

@pmnoxx
Copy link
Contributor

pmnoxx commented Jan 7, 2022

diff --git a/test-utils/logger/src/lib.rs b/test-utils/logger/src/lib.rs
index 7372b5308..6be107180 100644
--- a/test-utils/logger/src/lib.rs
+++ b/test-utils/logger/src/lib.rs
@@ -20,7 +20,7 @@ fn setup_subscriber_from_filter(mut env_filter: EnvFilter) {
     let _ = tracing_subscriber::fmt::Subscriber::builder()
         .with_span_events(tracing_subscriber::fmt::format::FmtSpan::CLOSE)
         .with_env_filter(env_filter)
-        .with_writer(std::io::stderr)
+        .with_writer(tracing_subscriber::fmt::TestWriter::new())
         .try_init();
 }
 

fixes the issue (a bit swamped rn to submit an actual PR)

Thanks, I tested it. This works fine: #6023

mina86 added a commit to mina86/nearcore that referenced this issue Jan 7, 2022
Rust’s `libtest` has a… limitation where it doesn’t capture data
written directly to `std::io::stdout` or `std::io::stderr` (see
rust-lang/rust#90785).  Since we are test
logger to use the latter, none of the log messages are captured by the
test harness.

The solution is to use `TestWriter` as the writer instead which
cooperates `libtest` better.

Issue: near#4490
@mina86
Copy link
Contributor

mina86 commented Jan 7, 2022

fixes the issue (a bit swamped rn to submit an actual PR)

Indeed, #6025

replace expensive_tests with run-time check (see the example at the end of "Make Tests Fast" section of this post).

The issue with that approach is that if you forget to set the environment variable you end up with a passing test that doesn’t test anything. This is another limitation of libtest which does not allow returning an ignore state from a test, see rust-lang/rust#68007. The issue mentions test-with crate [update: test-with is not fit for purpose] and an RFC so I’m looking at those now.

near-bulldozer bot pushed a commit that referenced this issue Jan 7, 2022
Change logger to print output of `cargo test` correctly.

#4490 (comment)
mina86 added a commit that referenced this issue Jan 9, 2022
Firstly, the proper way to denote a test as expensive is to use
`#[cfg]` directive before its definition rather than adding a return
inside of the function.  The latter will cause the test to show up in
list of tests and always pass giving people false sense that running
the test without `expensive_tests` feature actually does something.

After fixing `sync_state_nodes_multishard` also add it to nightly run.

Secondly, the `check_nightly.py` script is rather crude and the order
in which `#[cfg]` and `#[test]` attributes are applied do matter.
With `#[test]` being first, the script failed to recognise
`test_catchup` as an expensive test.

Thirdly, the `check_nightly.py` script also doesn’t understand
`#[cfg]` matching multiple features.  Because of that it does not
detect `test_highload` as an expensive test either.  Since we’re not
using `regression_tests` feature for anything simply remove it.

Issue: #4490
mina86 added a commit that referenced this issue Jan 9, 2022
Introduce a `#[nightly_test]` attribute to be used to mark tests as
expensive (i.e. to be run nightly on NayDuck but omitted from casual
`cargo test` invocations or CI runs).

It replaces the use of `#[cfg(feature = "expensive_tests")]` directive
and under the hood marks tests as ignored rather than skipping their
compilation.

This means that while previously nightly tests would be built only if
`expensive_tests` feature is set now they are always compiled.  This
further means that changes that break such tests will be caught by CI
(previously they would break only once NayDuck decides to run them)
and there is no longer need to conditionally compile helper functions
which are used by such tests only.

Issue: #4490
near-bulldozer bot pushed a commit that referenced this issue Jan 10, 2022
Firstly, the proper way to denote a test as expensive is to use
`#[cfg]` directive before its definition rather than adding a return
inside of the function.  The latter will cause the test to show up in
list of tests and always pass giving people false sense that running
the test without `expensive_tests` feature actually does something.

After fixing `sync_state_nodes_multishard` also add it to nightly run.

Secondly, the `check_nightly.py` script is rather crude and the order
in which `#[cfg]` and `#[test]` attributes are applied do matter.
With `#[test]` being first, the script failed to recognise
`test_catchup` as an expensive test.

Thirdly, the `check_nightly.py` script also doesn’t understand
`#[cfg]` matching multiple features.  Because of that it does not
detect `test_highload` as an expensive test either.  Since we’re not
using `regression_tests` feature for anything simply remove it.

Issue: #4490
mina86 added a commit that referenced this issue Jan 10, 2022
Rather than using `cfg` to not compile expensive tests, use `cfg_attr`
to conditionally mark such tests as ignored.  In other words, instead
of writing:

    #[cfg(feature = "expensive_tests")]
    #[test]
    fn test_clear_old_data_too_many_heights() {
        // ...
    }

write:

    #[test]
    #[cfg_attr(not(feature = "expensive_tests"), ignore)]
    fn test_clear_old_data_too_many_heights() {
        // ...
    }

With this change, expensive tests will always be built which means
that i) any code changes breaking them will be caught by CI (rather
than having to wait for a nightly run) and ii) code used by expensive
tests only is no longer unused when `expensive_tests` feature is not
enabled (which means fewer `#[cfg(feature = "expensive_tests")]`
directives sprinkled throughout code).

Since we no longer mark whole modules as compiled only if the feature
is enabled, this change also means that each individual test needs to
be marked individually (rather than being able to mark whole module).
This makes it more obvious which tests are expensive and which aren’t
(since the marking is right at the test definition site) and
simplifies `check_nightly.py` script.

Issue: #4490
mina86 added a commit that referenced this issue Jan 10, 2022
Rather than using `cfg` to not compile expensive tests, use `cfg_attr`
to conditionally mark such tests as ignored.  In other words, instead
of writing:

    #[cfg(feature = "expensive_tests")]
    #[test]
    fn test_clear_old_data_too_many_heights() {
        // ...
    }

write:

    #[test]
    #[cfg_attr(not(feature = "expensive_tests"), ignore)]
    fn test_clear_old_data_too_many_heights() {
        // ...
    }

With this change, expensive tests will always be built which means
that i) any code changes breaking them will be caught by CI (rather
than having to wait for a nightly run) and ii) code used by expensive
tests only is no longer unused when `expensive_tests` feature is not
enabled (which means fewer `#[cfg(feature = "expensive_tests")]`
directives sprinkled throughout code).

Since we no longer mark whole modules as compiled only if the feature
is enabled, this change also means that each individual test needs to
be marked individually (rather than being able to mark whole module).
This makes it more obvious which tests are expensive and which aren’t
(since the marking is right at the test definition site) and
simplifies `check_nightly.py` script.

Issue: #4490
@matklad
Copy link
Contributor Author

matklad commented Jan 13, 2022

#5817 is related -- tests not being flaky is a part of cargo test being usable out of the box.

near-bulldozer bot pushed a commit that referenced this issue Jan 13, 2022
Rather than using `cfg` to not compile expensive tests, use `cfg_attr`
to conditionally mark such tests as ignored.  In other words, instead
of writing:

    #[cfg(feature = "expensive_tests")]
    #[test]
    fn test_clear_old_data_too_many_heights() {
        // ...
    }

write:

    #[test]
    #[cfg_attr(not(feature = "expensive_tests"), ignore)]
    fn test_clear_old_data_too_many_heights() {
        // ...
    }

With this change, expensive tests will always be built which means
that i) any code changes breaking them will be caught by CI (rather
than having to wait for a nightly run) and ii) code used by expensive
tests only is no longer unused when `expensive_tests` feature is not
enabled (which means fewer `#[cfg(feature = "expensive_tests")]`
directives sprinkled throughout code).

Since we no longer mark whole modules as compiled only if the feature
is enabled, this change also means that each individual test needs to
be marked individually (rather than being able to mark whole module).
This makes it more obvious which tests are expensive and which aren’t
(since the marking is right at the test definition site) and
simplifies `check_nightly.py` script.

Issue: #4490
near-bulldozer bot pushed a commit that referenced this issue Jan 14, 2022
@matklad
Copy link
Contributor Author

matklad commented Jan 25, 2022

FWIW, the situation is much improved, cargo test now takes only 4 minutes locally! Thanks @mina86 and @pmnoxx !

@matklad matklad closed this as completed Jan 25, 2022
@matklad
Copy link
Contributor Author

matklad commented Jan 25, 2022

For the future, by running cargo test -- -Z unstable-options --report-time we can find further slowest tests.

@pmnoxx
Copy link
Contributor

pmnoxx commented Jan 25, 2022

@matklad Thanks, that's really useful. I'll add it to my scripts.

@gmilescu gmilescu added the Node Node team label Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Area: Unit testing / integration testing C-housekeeping Category: Refactoring, cleanups, code quality Node Node team P-low Priority: low T-node Team: issues relevant to the node experience team
Projects
None yet
Development

No branches or pull requests

7 participants