Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test flakiness investigation and attempted fixes ❄ #3498

Merged
merged 61 commits into from
Oct 25, 2023
Merged

Conversation

darrenburns
Copy link
Member

@darrenburns darrenburns commented Oct 10, 2023

Here are my theories and the corresponding changes for each of the flaky tests identified by Rodrigo here: #3484 (comment).

Work-in-progress, but feel free to comment/discuss the theories.

✅ = I think I've fixed it

test_schedule_reverse_animations

This test essentially was starting two timers one line after the other, and assuming the order in which the timers finished was guaranteed. This isn't the case.

Solution: I've modified the test to make it almost certain that the reverse animation begins to run while the forward animation is in progress.

test_scheduling_animation

I think that the pause was actually detrimental here - the delay 0.1 and 0.9 * 0.1 are too close to each other, and so sometimes the execution didn't continue after the pause until the animation was already complete.

Solution: I opted to remove the pause here rather than reduce the 0.9 value, as I don't think the pause adds much.

test_inactive_stack_is_alive

switch_mode leads to important asynchronous (queue-based) work being done, but it cannot be awaited.

When you perform any action after switching modes, you cannot be sure what state the application is in - you don't know, for example, if the root screen associated with the mode has been mounted yet.

Solution: I've made switch_mode return an AwaitMount, and updated the test to await it.

Edit:
Turns out the problem was simpler - the test was querying for a Label in the DOM every 0.1 seconds. Since the app was switching screens, there were small windows where there was no Label in the DOM, and this error wasn't accounted for.

I opted to remove this test because I think the approach it took was generally flawed and I don't think it gave much value.

test_remove_tabs

The problem is that the AwaitRemove returned by remove_tab only waits for the removal of the Tab from the DOM. The addition of the .-active class happens some time in the future. This ultimately means that we cannot safely use Tabs.active_tab in an application that supports tab removal.

Solution: I've added a more general object for optional awaits called AwaitComplete. remove_tab now returns an AwaitComplete which waits for both the DOM node being removed, and for the corresponding internal state updates.

test_remove_tabs_messages

This looks like it was closely related to the issue above. remove_tab was being called in a loop, which was ultimately resulting in many messages being dispatched after refresh.

Solution: Same as above. Also, the messages are no longer dispatched after refresh. I tested an example to see if this reintroduced an animation issue with the underline bar, but couldn't see any problems.


There's still some flakiness here I can't work out because it's rare:

It looks like _on_mount can, very occasionally, still be called before its children can be queried. I can’t see any other explanation after tracing through code as much as possible. I got 2 failures in a row within Tabs where the child Underline wasn’t being returned from a query inside Tabs._on_mount.


test_textual_dev_border_preview

Looks like this may have been due to the button press animation.

The test code uses wait_for_scheduled_animations, but the Button widget doesn't use animations for the active/click/press effect.

test_command_palette

This failure happened during a period where snapshot reports weren't being produced in CI, so it's not possible to see why this failure occurred.

There were 2 issues here:

  1. The Input widget cursor blinking in the CommandPalette - all other snapshot tests which use an Input set cursor_blink = False.
  2. The cursor_blink reactive didn't do anything when changed at runtime. It could only be set before the widget was actually mounted.

Solution: I've switched off the blinking cursor for this test. I've fixed the Input widget such that it handles toggling the cursor_blink value at runtime.

test_input_suggestions

This failed during investigating the above issue - very similar to the above failure.

Solution: I've switched off the blinking cursor for this test. I also added a new watcher to Input which ensures that as soon as you set the cursor_blink to False, the cursor becomes visible again.

test_directory_tree_reload_other_node

This is a problem that affects many DirectoryTree tests.

Loading and reloading of directories are done inside a worker. The test doesn't know when this worker is complete - yet it's not safe to use the DirectoryTree unless you know the worker is complete.

Solution: I've made the processing of the queue which performs the "reload" awaitable.

test_app_with_notifications_that_expire

This test used time.sleep instead of asyncio.sleep which blocked the event loop and likely contributed to flakiness - giving less time for notification timers to expire the notifications correctly.

Solution: I've reworked the timeouts used in the test so the notifications will time out much faster, and we'll wait for a long time (relative to their timeout) for them to be expired. Also switched to asyncio.sleep.

test_loading

There's a known issue where render can fire before on_mount. This means if you're initialising state inside on_mount instead of __init__, and your render method uses that state, there may be a crash.

This caused a crash in the test for LoadingIndicator.

Solution: The LoadingIndicator now initialises the state inside the constructor instead of in on_mount. This doesn't solve the root problem, but avoids a crash which can occur in this test (and presumably also in any real applications which use LoadingIndictor).

Other issues

Fixes #3484

@darrenburns darrenburns changed the title Test flakiness Test flakiness investigation and attempted fixes ❄ Oct 12, 2023
Copy link
Collaborator

@willmcgugan willmcgugan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-review review. I think you're on the right track. These changes seem to be worthwhile, even if we didn't have the flaky tests issue.

src/textual/await_complete.py Show resolved Hide resolved
src/textual/await_complete.py Outdated Show resolved Hide resolved
src/textual/await_complete.py Outdated Show resolved Hide resolved
src/textual/await_complete.py Outdated Show resolved Hide resolved
Copy link
Contributor

@rodrigogiraoserrao rodrigogiraoserrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only found a couple of small things that could be improved.

There were also a couple of Return docstrings which I requested you indent. I recall doing this for longer docstrings and I think there was a reason we did it, so I suggested changes in line with that.

There are also a couple of return types AwaitComplete that are generics but that you didn't fill in and the typecheckers will complain.

src/textual/app.py Outdated Show resolved Hide resolved
src/textual/widgets/_data_table.py Outdated Show resolved Hide resolved
@@ -152,7 +154,7 @@ def __init__(
)
self.path = path

def _add_to_load_queue(self, node: TreeNode[DirEntry]) -> None:
def _add_to_load_queue(self, node: TreeNode[DirEntry]) -> AwaitComplete:
Copy link
Contributor

@rodrigogiraoserrao rodrigogiraoserrao Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring is missing the return.
If I'm understanding this, the await complete we get is one that waits for the whole load queue to be processed, right?

I don't know how Queue works, but if you call join and then add more nodes to the load queue, won't the previously called join also wait for those nodes to be loaded?

Copy link
Member Author

@darrenburns darrenburns Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be on to something - I thought join() returned a Future. I'll investigate this more.
(I misread)

Yes, if you call join and then add more nodes to the queue, the join will wait until the queue is completely empty. An alternative would be to post a "marker" message on to the queue and wait for that to be processed, then stop waiting - maybe that'd be better for since some people might be polling their filesystem for changes faster than the queue is processed.

src/textual/widgets/_directory_tree.py Show resolved Hide resolved
src/textual/widgets/_directory_tree.py Show resolved Hide resolved
src/textual/widgets/_tabbed_content.py Outdated Show resolved Hide resolved
src/textual/widgets/_tabbed_content.py Outdated Show resolved Hide resolved
src/textual/widgets/_tabs.py Outdated Show resolved Hide resolved
src/textual/widgets/_tabs.py Show resolved Hide resolved
@@ -172,47 +172,6 @@ async def test_screen_stack_preserved(ModesApp: Type[App]):
await pilot.press("o")


async def test_inactive_stack_is_alive():
"""This tests that timers in screens outside the active stack keep going."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to remove this test?
Maybe it wasn't the best test, but I feel like it was testing a relevant thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't really sure how to rewrite this in a way that wouldn't be flaky - open to suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably with larger time intervals and explicitly pausing between mode switches and the final assert.
We also probably don't need to switch twice.
So, start in a mode that sets a timer to append something to a list or whatever.
Switch to another mode that does nothing.
Wait for long enough and make sure the initial list append happened.

@willmcgugan
Copy link
Collaborator

I'll review this after it has been Rodrigoed.

darrenburns and others added 3 commits October 24, 2023 16:45
Co-authored-by: Rodrigo Girão Serrão <5621605+rodrigogiraoserrao@users.noreply.github.com>
Co-authored-by: Rodrigo Girão Serrão <5621605+rodrigogiraoserrao@users.noreply.github.com>
Co-authored-by: Rodrigo Girão Serrão <5621605+rodrigogiraoserrao@users.noreply.github.com>
@darrenburns
Copy link
Member Author

@willmcgugan Ready whenever.

Copy link
Collaborator

@willmcgugan willmcgugan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Some very satisfying changes here (removing all those pauses). Some questions and suggestions.

CHANGELOG.md Outdated Show resolved Hide resolved
src/textual/await_complete.py Outdated Show resolved Hide resolved
src/textual/await_complete.py Outdated Show resolved Hide resolved
src/textual/widgets/_data_table.py Outdated Show resolved Hide resolved
src/textual/await_complete.py Outdated Show resolved Hide resolved
src/textual/widgets/_tabbed_content.py Show resolved Hide resolved
Copy link
Collaborator

@willmcgugan willmcgugan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things to consider, but LGTM

ReturnType = TypeVar("ReturnType")


@rich.repr.auto(angular=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be helpful if the repr included the coroutines.

Maybe see what it generates. If it looks like noise, we can leave it as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it already include the coroutines, since the parameter is named the same as the attribute?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right. What a nice feature. I must thank the dev who implemented that.

Curious what the repr will look like.

src/textual/await_complete.py Outdated Show resolved Hide resolved
@darrenburns darrenburns merged commit 34fb596 into main Oct 25, 2023
23 checks passed
@darrenburns darrenburns deleted the flaky-tests branch October 25, 2023 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tabs still has active tab after cleared Revisit flaky tests
3 participants