Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to reduce test flakiness on Windows #1845

Merged
merged 2 commits into from
Oct 24, 2024
Merged

Attempt to reduce test flakiness on Windows #1845

merged 2 commits into from
Oct 24, 2024

Conversation

pietern
Copy link
Contributor

@pietern pietern commented Oct 21, 2024

Changes

Test failures indicate that both stdout and stderr are consumed, yet the content of stdout doesn't end up in the intended output. This can happen if the goroutines responsible for writing to the combined output buffer attempt to write to the same underlying buffer concurrently.

Example failure:

=== RUN   TestBackgroundCombinedOutput
    background_test.go:65: 
        	Error Trace:	D:/a/cli/cli/libs/process/background_test.go:65
        	Error:      	elements differ
        	            	
        	            	extra elements in list A:
        	            	([]interface {}) (len=1) {
        	            	 (string) (len=1) "2"
        	            	}
        	            	
        	            	
        	            	listA:
        	            	([]string) (len=2) {
        	            	 (string) (len=1) "1",
        	            	 (string) (len=1) "2"
        	            	}
        	            	
        	            	
        	            	listB:
        	            	([]string) (len=1) {
        	            	 (string) (len=1) "1"
        	            	}
        	Test:       	TestBackgroundCombinedOutput

With the test body:

func TestBackgroundCombinedOutput(t *testing.T) {
ctx := context.Background()
buf := bytes.Buffer{}
res, err := Background(ctx, []string{
"python3", "-c", "import sys, time; " +
`sys.stderr.write("1\n"); sys.stderr.flush(); ` +
"time.sleep(0.001); " +
"print('2', flush=True); sys.stdout.flush(); " +
"time.sleep(0.001)",
}, WithCombinedOutput(&buf))
assert.NoError(t, err)
assert.Equal(t, "2", strings.TrimSpace(res))
// The order of stdout and stderr being read into the buffer
// for combined output is not deterministic due to scheduling
// of the underlying goroutines that consume them.
// That's why this asserts on the contents and not the order.
assert.ElementsMatch(t, []string{"1", "2"}, splitLines(buf.Bytes()))
}

With the implementation of WithCombinedOutput:

cli/libs/process/opts.go

Lines 72 to 78 in ca45e53

func WithCombinedOutput(buf *bytes.Buffer) execOption {
return func(_ context.Context, c *exec.Cmd) error {
c.Stdout = io.MultiWriter(buf, c.Stdout)
c.Stderr = io.MultiWriter(buf, c.Stderr)
return nil
}
}

Notice that c.Stdout does get the "2", or the test failure would have included the relevant assertion error. This leads me to believe that there is a race on writing to buf from the two goroutines writing to c.Stdout and c.Stderr.

Tests

The test passes. If this PR has the intended effect remains to be seen...

Copy link
Contributor

@shreyas-goenka shreyas-goenka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, Why is this only a problem on windows? Indeed it does not seem to happen on macs (ran the test 1000 times on my machine).

libs/process/opts.go Outdated Show resolved Hide resolved
@pietern
Copy link
Contributor Author

pietern commented Oct 24, 2024

@shreyas-goenka It is possible the goroutines for stdout and stderr are woken up at the same time on Windows due to some interactions at the I/O notification level. I expect that on other platforms they are woken up sequentially (due to the tiny sleep in the Python code).

@pietern pietern added this pull request to the merge queue Oct 24, 2024
Merged via the queue into main with commit eddadda Oct 24, 2024
5 checks passed
@pietern pietern deleted the unflake-windows branch October 24, 2024 12:09
andrewnester added a commit that referenced this pull request Oct 30, 2024
New features for Databricks Asset Bundles:
This release adds support for managing AI/BI dashboards as part of your bundle configuration. The `bundle generate` command is updated to support producing dashboard bundle configuration as well as dashboard payloads.
You can find an example configuration and walkthrough at https://github.com/databricks/bundle-examples/tree/main/knowledge_base/dashboard_nyc_taxi

Bundles:
 * Add support for AI/BI dashboards ([#1743](#1743)).
 * Added validator for folder permissions ([#1824](#1824)).
 * Add bundle generate variant for dashboards ([#1847](#1847)).
 * Use SetPermissions instead of UpdatePermissions when setting folder permissions based on top-level ones ([#1822](#1822)).

Internal:
 * Attempt to reduce test flakiness on Windows ([#1845](#1845)).
 * Reuse resource resolution code for the run command ([#1858](#1858)).
 * [Internal] Automatically trigger integration tests on PR ([#1857](#1857)).
 * Add privacy notice to README ([#1841](#1841)).
 * [Internal] Add test instructions for external contributors ([#1863](#1863)).
 * Add `libs/dyn/jsonsaver` ([#1862](#1862)).

Dependency updates:
 * Bump github.com/fatih/color from 1.17.0 to 1.18.0 ([#1861](#1861)).
github-merge-queue bot pushed a commit that referenced this pull request Oct 30, 2024
**New features for Databricks Asset Bundles:**

This release adds support for managing AI/BI dashboards as part of your
bundle configuration. The `bundle generate` command is updated to
support producing dashboard bundle configuration as well as a serialized
JSON representation of the dashboard.
You can find an example configuration and walkthrough at
https://github.com/databricks/bundle-examples/tree/main/knowledge_base/dashboard_nyc_taxi

CLI:
* Add privacy notice to README
([#1841](#1841)).

Bundles:
* Add support for AI/BI dashboards
([#1743](#1743)).
* Added validator for folder permissions
([#1824](#1824)).
* Add bundle generate variant for dashboards
([#1847](#1847)).
* Use SetPermissions instead of UpdatePermissions when setting folder
permissions based on top-level ones
([#1822](#1822)).

Internal:
* Attempt to reduce test flakiness on Windows
([#1845](#1845)).
* Reuse resource resolution code for the run command
([#1858](#1858)).
* [Internal] Automatically trigger integration tests on PR
([#1857](#1857)).
* [Internal] Add test instructions for external contributors
([#1863](#1863)).
* Add `libs/dyn/jsonsaver`
([#1862](#1862)).


Dependency updates:
* Bump github.com/fatih/color from 1.17.0 to 1.18.0
([#1861](#1861)).

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants