Test flakiness investigation and attempted fixes ❄ #3498

darrenburns · 2023-10-10T13:08:48Z

Here are my theories and the corresponding changes for each of the flaky tests identified by Rodrigo here: #3484 (comment).

Work-in-progress, but feel free to comment/discuss the theories.

✅ = I think I've fixed it

`test_schedule_reverse_animations` ✅

This test essentially was starting two timers one line after the other, and assuming the order in which the timers finished was guaranteed. This isn't the case.

Solution: I've modified the test to make it almost certain that the reverse animation begins to run while the forward animation is in progress.

`test_scheduling_animation` ✅

I think that the pause was actually detrimental here - the delay 0.1 and 0.9 * 0.1 are too close to each other, and so sometimes the execution didn't continue after the pause until the animation was already complete.

Solution: I opted to remove the pause here rather than reduce the 0.9 value, as I don't think the pause adds much.

`test_inactive_stack_is_alive` ✅

switch_mode leads to important asynchronous (queue-based) work being done, but it cannot be awaited.

When you perform any action after switching modes, you cannot be sure what state the application is in - you don't know, for example, if the root screen associated with the mode has been mounted yet.

Solution: I've made switch_mode return an AwaitMount, and updated the test to await it.

Edit:
Turns out the problem was simpler - the test was querying for a Label in the DOM every 0.1 seconds. Since the app was switching screens, there were small windows where there was no Label in the DOM, and this error wasn't accounted for.

I opted to remove this test because I think the approach it took was generally flawed and I don't think it gave much value.

`test_remove_tabs`

The problem is that the AwaitRemove returned by remove_tab only waits for the removal of the Tab from the DOM. The addition of the .-active class happens some time in the future. This ultimately means that we cannot safely use Tabs.active_tab in an application that supports tab removal.

Solution: I've added a more general object for optional awaits called AwaitComplete. remove_tab now returns an AwaitComplete which waits for both the DOM node being removed, and for the corresponding internal state updates.

`test_remove_tabs_messages`

This looks like it was closely related to the issue above. remove_tab was being called in a loop, which was ultimately resulting in many messages being dispatched after refresh.

Solution: Same as above. Also, the messages are no longer dispatched after refresh. I tested an example to see if this reintroduced an animation issue with the underline bar, but couldn't see any problems.

There's still some flakiness here I can't work out because it's rare:

It looks like _on_mount can, very occasionally, still be called before its children can be queried. I can’t see any other explanation after tracing through code as much as possible. I got 2 failures in a row within Tabs where the child Underline wasn’t being returned from a query inside Tabs._on_mount.

`test_textual_dev_border_preview` ✅

Looks like this may have been due to the button press animation.

The test code uses wait_for_scheduled_animations, but the Button widget doesn't use animations for the active/click/press effect.

`test_command_palette` ✅

This failure happened during a period where snapshot reports weren't being produced in CI, so it's not possible to see why this failure occurred.

There were 2 issues here:

The Input widget cursor blinking in the CommandPalette - all other snapshot tests which use an Input set cursor_blink = False.
The cursor_blink reactive didn't do anything when changed at runtime. It could only be set before the widget was actually mounted.

Solution: I've switched off the blinking cursor for this test. I've fixed the Input widget such that it handles toggling the cursor_blink value at runtime.

`test_input_suggestions` ✅

This failed during investigating the above issue - very similar to the above failure.

Solution: I've switched off the blinking cursor for this test. I also added a new watcher to Input which ensures that as soon as you set the cursor_blink to False, the cursor becomes visible again.

`test_directory_tree_reload_other_node` ✅

This is a problem that affects many DirectoryTree tests.

Loading and reloading of directories are done inside a worker. The test doesn't know when this worker is complete - yet it's not safe to use the DirectoryTree unless you know the worker is complete.

Solution: I've made the processing of the queue which performs the "reload" awaitable.

`test_app_with_notifications_that_expire` ✅

This test used time.sleep instead of asyncio.sleep which blocked the event loop and likely contributed to flakiness - giving less time for notification timers to expire the notifications correctly.

Solution: I've reworked the timeouts used in the test so the notifications will time out much faster, and we'll wait for a long time (relative to their timeout) for them to be expired. Also switched to asyncio.sleep.

`test_loading` ✅

There's a known issue where render can fire before on_mount. This means if you're initialising state inside on_mount instead of __init__, and your render method uses that state, there may be a crash.

This caused a crash in the test for LoadingIndicator.

Solution: The LoadingIndicator now initialises the state inside the constructor instead of in on_mount. This doesn't solve the root problem, but avoids a crash which can occur in this test (and presumably also in any real applications which use LoadingIndictor).

Other issues

Tabs.active wasn't being re-assigned when tabs get removed or cleared (fixes Tabs still has active tab after cleared #3523).
The cursor_blink reactive on Input didn't do anything when changed at runtime.

Fixes #3484

…e cursor immediately becomes visible. Ensure we turn of cursor blink inside the input suggetions snapshot test.

…ette snapshot test

…ion in snapshot test

willmcgugan

Pre-review review. I think you're on the right track. These changes seem to be worthwhile, even if we didn't have the flaky tests issue.

src/textual/await_complete.py

…nd exposes timeout parameter

rodrigogiraoserrao

I only found a couple of small things that could be improved.

There were also a couple of Return docstrings which I requested you indent. I recall doing this for longer docstrings and I think there was a reason we did it, so I suggested changes in line with that.

There are also a couple of return types AwaitComplete that are generics but that you didn't fill in and the typecheckers will complain.

src/textual/app.py

src/textual/widgets/_data_table.py

rodrigogiraoserrao · 2023-10-24T12:12:22Z

src/textual/widgets/_directory_tree.py

@@ -152,7 +154,7 @@ def __init__(
        )
        self.path = path

-    def _add_to_load_queue(self, node: TreeNode[DirEntry]) -> None:
+    def _add_to_load_queue(self, node: TreeNode[DirEntry]) -> AwaitComplete:


The docstring is missing the return.
If I'm understanding this, the await complete we get is one that waits for the whole load queue to be processed, right?

I don't know how Queue works, but if you call join and then add more nodes to the load queue, won't the previously called join also wait for those nodes to be loaded?

~~You might be on to something - I thought join() returned a Future. I'll investigate this more.~~
(I misread)

Yes, if you call join and then add more nodes to the queue, the join will wait until the queue is completely empty. An alternative would be to post a "marker" message on to the queue and wait for that to be processed, then stop waiting - maybe that'd be better for since some people might be polling their filesystem for changes faster than the queue is processed.

src/textual/widgets/_directory_tree.py

src/textual/widgets/_tabbed_content.py

src/textual/widgets/_tabs.py

rodrigogiraoserrao · 2023-10-24T12:57:46Z

tests/test_screen_modes.py

@@ -172,47 +172,6 @@ async def test_screen_stack_preserved(ModesApp: Type[App]):
            await pilot.press("o")


-async def test_inactive_stack_is_alive():
-    """This tests that timers in screens outside the active stack keep going."""


Is it safe to remove this test?
Maybe it wasn't the best test, but I feel like it was testing a relevant thing.

I wasn't really sure how to rewrite this in a way that wouldn't be flaky - open to suggestions.

Probably with larger time intervals and explicitly pausing between mode switches and the final assert.
We also probably don't need to switch twice.
So, start in a mode that sets a timer to append something to a list or whatever.
Switch to another mode that does nothing.
Wait for long enough and make sure the initial list append happened.

willmcgugan · 2023-10-24T14:17:31Z

I'll review this after it has been Rodrigoed.

Co-authored-by: Rodrigo Girão Serrão <5621605+rodrigogiraoserrao@users.noreply.github.com>

darrenburns · 2023-10-24T16:20:57Z

@willmcgugan Ready whenever.

willmcgugan

Nice work. Some very satisfying changes here (removing all those pauses). Some questions and suggestions.

CHANGELOG.md

src/textual/await_complete.py

src/textual/widgets/_data_table.py

src/textual/await_complete.py

src/textual/widgets/_tabbed_content.py

…table cursor into view

…ky-tests

Signed-off-by: Darren Burns <darrenb900@gmail.com>

willmcgugan

A couple of things to consider, but LGTM

willmcgugan · 2023-10-25T12:52:17Z

src/textual/await_complete.py

+ReturnType = TypeVar("ReturnType")
+
+
+@rich.repr.auto(angular=True)


I wonder if it would be helpful if the repr included the coroutines.

Maybe see what it generates. If it looks like noise, we can leave it as is.

Wouldn't it already include the coroutines, since the parameter is named the same as the attribute?

Ah, you're right. What a nice feature. I must thank the dev who implemented that.

Curious what the repr will look like.

src/textual/await_complete.py

…it's a variable length param.

src/textual/widgets/_directory_tree.py

darrenburns added 4 commits October 10, 2023 14:02

Modifying two flaky animation tests, hopefully removing flakiness :)

6b02725

Make switch_mode return an AwaitMount

37ff2cf

Fix event issue

f113057

Merge branch 'main' of github.com:Textualize/textual into flaky-tests

be865f9

darrenburns changed the title ~~Test flakiness~~ Test flakiness investigation and attempted fixes ❄ Oct 12, 2023

darrenburns added 22 commits October 12, 2023 13:05

Add AwaitComplete - a more generalised optionally awaitable object

120feb0

Use AwaitComplete in Tabs.remove_tab() and update tests accordingly.

8525c53

Update TabbedContent to use AwaitComplete instead of AwaitTabbedContent

2539df9

Simplifying - dont use optional awaitables where not required

842bdf8

Update variable name

ab9f1cd

Update a comment

65d7853

Add watcher for cursor blink to ensure when blink is switched off, th…

0b4f3a9

…e cursor immediately becomes visible. Ensure we turn of cursor blink inside the input suggetions snapshot test.

Fix cursor blink reactive and disable cursor blink in the command pal…

3741f59

…ette snapshot test

More progress

b73f69c

Reworking AwaitComplete

86da626

Some more work on tabs flakiness/race-conditions

0faf7b2

Merge branch 'main' of github.com:willmcgugan/textual into flaky-tests

8ecbe20

Ensure active tab is set correctly

fa36cd4

Simplify next tab assignment

c0fef47

Simplify removing tabs logic

2df0584

Make button animation duration configurable; Switch off button animat…

5b0d78e

…ion in snapshot test

Remove a flawed test

cda2319

Add awaits in some tests

6c686ff

Docstrings

68f2d2e

Make active_effect_duration an instance attribute

d871c83

Fix a Tabs crash

710d6b5

Await the tree reload when the path changes in DirectoryTree

1a7feec

willmcgugan reviewed Oct 17, 2023

View reviewed changes

src/textual/await_complete.py Show resolved Hide resolved

src/textual/await_complete.py Outdated Show resolved Hide resolved

src/textual/await_complete.py Outdated Show resolved Hide resolved

src/textual/await_complete.py Outdated Show resolved Hide resolved

darrenburns added 2 commits October 17, 2023 12:18

Change AwaitComplete _instances class attr to a set from a list

b8428dd

Make AwaitComplete generic, AwaitComplete._wait_all is now private, a…

55c0950

…nd exposes timeout parameter

darrenburns requested review from davep and rodrigogiraoserrao October 24, 2023 11:01

Remove debugging prints

5642681

rodrigogiraoserrao suggested changes Oct 24, 2023

View reviewed changes

darrenburns added 4 commits October 24, 2023 14:11

Fix broken docstring, remove unused import

5dd2f81

Merge branch 'main' of github.com:willmcgugan/textual into flaky-tests

59572c7

Rename variable to make it clearer

62ac7e2

Add missing return type annotation

a7bb953

darrenburns and others added 3 commits October 24, 2023 16:45

Update src/textual/widgets/_tabbed_content.py

4b9bce3

Co-authored-by: Rodrigo Girão Serrão <5621605+rodrigogiraoserrao@users.noreply.github.com>

Update src/textual/widgets/_tabbed_content.py

0582f54

Co-authored-by: Rodrigo Girão Serrão <5621605+rodrigogiraoserrao@users.noreply.github.com>

Update src/textual/widgets/_tabs.py

65c97c8

Co-authored-by: Rodrigo Girão Serrão <5621605+rodrigogiraoserrao@users.noreply.github.com>

willmcgugan reviewed Oct 25, 2023

View reviewed changes

darrenburns added 8 commits October 25, 2023 11:55

Scroll datatable cursor after refresh

b4de92b

Add comment explaining use of call_after_refresh when scrolling data …

9ec4680

…table cursor into view

Merge branch 'flaky-tests' of github.com:willmcgugan/textual into fla…

15c308d

…ky-tests

Add repr to AwaitComplete (auto-repr_

547a918

Remove use of generics from AwaitComplete

6104481

Update changelog and improve docstring

d9feeda

Add a missing parameter from DirectoryTree.reset_node docstring.

1238870

Signed-off-by: Darren Burns <darrenb900@gmail.com>

Improve docstring in DirectoryTree

c5be6af

Signed-off-by: Darren Burns <darrenb900@gmail.com>

darrenburns requested review from rodrigogiraoserrao and willmcgugan October 25, 2023 12:49

willmcgugan approved these changes Oct 25, 2023

View reviewed changes

Rename parameter coroutine to coroutines in await_complete.py, since …

c238dac

…it's a variable length param.

rodrigogiraoserrao approved these changes Oct 25, 2023

View reviewed changes

src/textual/widgets/_directory_tree.py Show resolved Hide resolved

darrenburns merged commit 34fb596 into main Oct 25, 2023
23 checks passed

darrenburns deleted the flaky-tests branch October 25, 2023 13:41

TomJGooding mentioned this pull request Oct 25, 2023

fix(tabbed content): handle pane containing tabs #3444

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test flakiness investigation and attempted fixes ❄ #3498

Test flakiness investigation and attempted fixes ❄ #3498

darrenburns commented Oct 10, 2023 •

edited

Loading

willmcgugan left a comment

rodrigogiraoserrao left a comment •

edited

Loading

rodrigogiraoserrao Oct 24, 2023 •

edited

Loading

darrenburns Oct 24, 2023 •

edited

Loading

rodrigogiraoserrao Oct 24, 2023

darrenburns Oct 24, 2023

rodrigogiraoserrao Oct 25, 2023

willmcgugan commented Oct 24, 2023

darrenburns commented Oct 24, 2023

willmcgugan left a comment

willmcgugan left a comment

willmcgugan Oct 25, 2023

darrenburns Oct 25, 2023

willmcgugan Oct 25, 2023

		ReturnType = TypeVar("ReturnType")


		@rich.repr.auto(angular=True)

Test flakiness investigation and attempted fixes ❄ #3498

Test flakiness investigation and attempted fixes ❄ #3498

Conversation

darrenburns commented Oct 10, 2023 • edited Loading

test_schedule_reverse_animations ✅

test_scheduling_animation ✅

test_inactive_stack_is_alive ✅

test_remove_tabs

test_remove_tabs_messages

test_textual_dev_border_preview ✅

test_command_palette ✅

test_input_suggestions ✅

test_directory_tree_reload_other_node ✅

test_app_with_notifications_that_expire ✅

test_loading ✅

Other issues

willmcgugan left a comment

Choose a reason for hiding this comment

rodrigogiraoserrao left a comment • edited Loading

Choose a reason for hiding this comment

rodrigogiraoserrao Oct 24, 2023 • edited Loading

Choose a reason for hiding this comment

darrenburns Oct 24, 2023 • edited Loading

Choose a reason for hiding this comment

rodrigogiraoserrao Oct 24, 2023

Choose a reason for hiding this comment

darrenburns Oct 24, 2023

Choose a reason for hiding this comment

rodrigogiraoserrao Oct 25, 2023

Choose a reason for hiding this comment

willmcgugan commented Oct 24, 2023

darrenburns commented Oct 24, 2023

willmcgugan left a comment

Choose a reason for hiding this comment

willmcgugan left a comment

Choose a reason for hiding this comment

willmcgugan Oct 25, 2023

Choose a reason for hiding this comment

darrenburns Oct 25, 2023

Choose a reason for hiding this comment

willmcgugan Oct 25, 2023

Choose a reason for hiding this comment

darrenburns commented Oct 10, 2023 •

edited

Loading

`test_schedule_reverse_animations` ✅

`test_scheduling_animation` ✅

`test_inactive_stack_is_alive` ✅

`test_remove_tabs`

`test_remove_tabs_messages`

`test_textual_dev_border_preview` ✅

`test_command_palette` ✅

`test_input_suggestions` ✅

`test_directory_tree_reload_other_node` ✅

`test_app_with_notifications_that_expire` ✅

`test_loading` ✅

rodrigogiraoserrao left a comment •

edited

Loading

rodrigogiraoserrao Oct 24, 2023 •

edited

Loading

darrenburns Oct 24, 2023 •

edited

Loading