Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address flaky tests in CircleCI #1771

Closed
kyriediculous opened this issue Feb 19, 2021 · 2 comments · Fixed by #1791
Closed

Address flaky tests in CircleCI #1771

kyriediculous opened this issue Feb 19, 2021 · 2 comments · Fixed by #1791
Assignees

Comments

@kyriediculous
Copy link
Contributor

Describe the bug
Push tests failing in CI occasionally when checking for data races using the -race flag

+ go test -run Push_ -race
E0219 21:04:06.195031   15825 broadcast.go:672] Playlist insertion error nonce=7 manifestID=mani seqNo=17 err=segment already exists
E0219 21:04:06.207787   15825 broadcast.go:672] Playlist insertion error nonce=7 manifestID=mani seqNo=15 err=segment already exists
E0219 21:04:06.211869   15825 broadcast.go:672] Playlist insertion error nonce=7 manifestID=mani seqNo=12 err=segment already exists
E0219 21:04:06.213858   15825 mediaserver.go:703] Error reading http request body: invalid argument
E0219 21:04:06.214340   15825 mediaserver.go:729] Bad URL url=http://example.com/live/.ts
E0219 21:04:06.217232   15825 broadcast.go:413] Error inserting segment nonce=4270412215396280914 seqNo=1: segment already exists
E0219 21:04:06.274559   15825 segment_rpc.go:432] Unable to submit segment orch=https://127.0.0.1:33687 nonce=4928210163146058599 manifestID=seg sessionID=bar seqNo=0 orch=https://127.0.0.1:33687 err=Post "https://127.0.0.1:33687/segment": dial tcp 127.0.0.1:33687: connect: connection refused
E0219 21:04:06.276156   15825 segment_rpc.go:432] Unable to submit segment orch=https://127.0.0.1:33687 nonce=4928210163146058599 manifestID=seg sessionID=bar seqNo=1 orch=https://127.0.0.1:33687 err=Post "https://127.0.0.1:33687/segment": dial tcp 127.0.0.1:33687: connect: connection refused
E0219 21:04:06.279519   15825 segment_rpc.go:432] Unable to submit segment orch=https://127.0.0.1:33687 nonce=14061048399174835305 manifestID=new sessionID=bar seqNo=0 orch=https://127.0.0.1:33687 err=Post "https://127.0.0.1:33687/segment": dial tcp 127.0.0.1:33687: connect: connection refused
E0219 21:04:06.284238   15825 segment_rpc.go:432] Unable to submit segment orch=https://127.0.0.1:33687 nonce=6257648816308001904 manifestID=intweb sessionID=bar seqNo=0 orch=https://127.0.0.1:33687 err=Post "https://127.0.0.1:33687/segment": dial tcp 127.0.0.1:33687: connect: connection refused
E0219 21:04:06.286688   15825 segment_rpc.go:432] Unable to submit segment orch=https://127.0.0.1:33687 nonce=6257648816308001904 manifestID=intweb sessionID=bar seqNo=1 orch=https://127.0.0.1:33687 err=Post "https://127.0.0.1:33687/segment": dial tcp 127.0.0.1:33687: connect: connection refused
E0219 21:04:06.292965   15825 segment_rpc.go:432] Unable to submit segment orch=https://127.0.0.1:33687 nonce=17138668931496181097 manifestID=intmid sessionID=bar seqNo=1 orch=https://127.0.0.1:33687 err=Post "https://127.0.0.1:33687/segment": dial tcp 127.0.0.1:33687: connect: connection refused
--- FAIL: TestPush_ShouldRemoveSessionAfterTimeoutIfInternalMIDIsUsed (0.06s)
    push_test.go:590: 
        	Error Trace:	push_test.go:590
        	Error:      	Not equal: 
        	            	expected: "intmid"
        	            	actual  : ""
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-intmid
        	            	+
        	Test:       	TestPush_ShouldRemoveSessionAfterTimeoutIfInternalMIDIsUsed
    push_test.go:591: 
        	Error Trace:	push_test.go:591
        	Error:      	Should be true
        	Test:       	TestPush_ShouldRemoveSessionAfterTimeoutIfInternalMIDIsUsed
E0219 21:04:06.352028   15825 segment_rpc.go:432] Unable to submit segment orch=https://127.0.0.1:33687 nonce=15591855920356351262 manifestID=mani3 sessionID=bar seqNo=1 orch=https://127.0.0.1:33687 err=Post "https://127.0.0.1:33687/segment": dial tcp 127.0.0.1:33687: connect: connection refused
--- FAIL: TestPush_ShouldRemoveSessionAfterTimeout (0.06s)
    push_test.go:619: 
        	Error Trace:	push_test.go:619
        	Error:      	Should be true
        	Test:       	TestPush_ShouldRemoveSessionAfterTimeout

        	```

@yondonfu
Copy link
Member

There was a recent commit that increased sleep times in some tests. If you're encountering these types of failing tests can you make sure to rebase off of master?

@yondonfu
Copy link
Member

Update: There are a lot of flaky tests in CircleCI recently (race tests, tests involving sleeps, CLI tests, etc.). This may be the result of our CircleCI tests being run on a slower machine. The flaky tests are a pain for PR reviews because there are a lot of cases where it is not clear if the changes in a PR are triggering failures or if the existing flaky tests are triggering failures. It does not help that the failure output in CircleCI is usually not very informative - in many cases, the stdout/stderr logs are displayed, but the failed assertion is not immediately observable.

I think the next steps here should be to:

  • Address the flaky tests either by making sure tests run on a faster machine and/or updating tests to eliminate flaky behavior
  • Figure out how to make failed assertions immediately observable in CircleCI so PR reviewers/contributors don't have to spend time hunting down the source

I've updated the OP title to reflect that this is an issue not just with the push tests.

@yondonfu yondonfu changed the title Push tests flaky data race Address flaky tests in CircleCI Feb 25, 2021
@yondonfu yondonfu self-assigned this Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants