Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Collections.Concurrent.Tests timeout failures in CI #37186

Open
tarekgh opened this issue May 29, 2020 · 10 comments
Open

System.Collections.Concurrent.Tests timeout failures in CI #37186

tarekgh opened this issue May 29, 2020 · 10 comments
Labels
area-System.Collections runtime-mono specific to the Mono runtime
Milestone

Comments

@tarekgh
Copy link
Member

tarekgh commented May 29, 2020

net5.0-Linux-Debug-x64-Mono_release-RedHat.7.Amd64.Open https://helix.dot.net/api/2019-06-17/jobs/7c3e1020-fb91-4832-b4db-c0e1df873655/workitems/System.Collections.Concurrent.Tests/console
https://dev.azure.com/dnceng/public/_build/results?buildId=664284&view=logs&j=b4344b0d-0f92-5d69-ccaf-e0b24fbf14a2&t=555ea487-e6f6-5e9e-ac96-b1fa34f2540d&l=92

Executed on a003A66
+ ./RunTests.sh --runtime-path /home/helixbot/work/AA2E093E/p
----- start Fri May 29 21:41:25 UTC 2020 =============== To repro directly: =====================================================
pushd .
/home/helixbot/work/AA2E093E/p/dotnet exec --runtimeconfig System.Collections.Concurrent.Tests.runtimeconfig.json --depsfile System.Collections.Concurrent.Tests.deps.json xunit.console.dll System.Collections.Concurrent.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
~/work/AA2E093E/w/9E1908B4/e ~/work/AA2E093E/w/9E1908B4/e
  Discovering: System.Collections.Concurrent.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Collections.Concurrent.Tests (found 867 of 876 test cases)
  Starting:    System.Collections.Concurrent.Tests (parallel test collections = on, max threads = 2)
   System.Collections.Concurrent.Tests: [Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.AddTakeWithAtLeastOneElementInCollection_IsEmpty_AlwaysFalse', Elapsed: 00:02:01
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:02:02
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:02:02
   System.Collections.Concurrent.Tests: [Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.AddTakeWithAtLeastOneElementInCollection_IsEmpty_AlwaysFalse', Elapsed: 00:04:01
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:04:02
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:04:02
   System.Collections.Concurrent.Tests: [Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.AddTakeWithAtLeastOneElementInCollection_IsEmpty_AlwaysFalse', Elapsed: 00:06:01
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:06:02
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:06:02
   System.Collections.Concurrent.Tests: [Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.AddTakeWithAtLeastOneElementInCollection_IsEmpty_AlwaysFalse', Elapsed: 00:08:01
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:08:02
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:08:02
   System.Collections.Concurrent.Tests: [Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.AddTakeWithAtLeastOneElementInCollection_IsEmpty_AlwaysFalse', Elapsed: 00:10:01
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:10:02
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:10:02
   System.Collections.Concurrent.Tests: [Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.AddTakeWithAtLeastOneElementInCollection_IsEmpty_AlwaysFalse', Elapsed: 00:12:01
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:12:02
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:12:02
   System.Collections.Concurrent.Tests: [Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.AddTakeWithAtLeastOneElementInCollection_IsEmpty_AlwaysFalse', Elapsed: 00:14:01
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:02
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:14:02
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Collections untriaged New issue has not been triaged by the area owner labels May 29, 2020
@ghost
Copy link

ghost commented May 29, 2020

Tagging subscribers to this area: @eiriktsarpalis
Notify danmosemsft if you want to be subscribed.

@tarekgh tarekgh changed the title System.Collections.Concurrent.Tests failures in CI System.Collections.Concurrent.Tests timeout failures in CI May 29, 2020
@jkotas jkotas added the runtime-mono specific to the Mono runtime label May 30, 2020
@eiriktsarpalis eiriktsarpalis added this to the 5.0.0 milestone Jun 24, 2020
@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Jun 24, 2020
@eiriktsarpalis eiriktsarpalis self-assigned this Jul 2, 2020
@danmoseley
Copy link
Member

danmoseley commented Aug 2, 2020

@wfurt could you please help me find these timeouts in Kusto? I've tried for a while. I assume I should find Result="Fail" (I don't see a "TimedOut") but I don't find any failures on 5/29. If I look at Duration, eg

Jobs
| join kind=inner TestResults on JobId
| where Type1 startswith "System.Collections.Concurrent.Tests"
| where Result == "Fail"
| where Duration > 50

I only get 4 hits, and those were test failures rather than job failures.

I'm guessing I need to lookin a different way to find jobs that timed out.

@wfurt
Copy link
Member

wfurt commented Aug 3, 2020

WorkItems
| where FriendlyName == "System.Collections.Concurrent.Tests" and Status == "Timeout"

gives me ~ 130 entries including the one @tarekgh reported. The problem is that because the run did no finish and report any results, there are no entries in Jobs table.

WorkItems | where JobName == "7c3e1020-fb91-4832-b4db-c0e1df873655" and  Status != "Pass"

with one entry

vs empty

Jobs | where JobId  == "7c3e1020-fb91-4832-b4db-c0e1df873655"

@danmoseley
Copy link
Member

danmoseley commented Aug 3, 2020

Thank you @wfurt. Just what I needed.

It seems the tests do not normally take long. Of the last couple months, 104 out of 22,461 work items timed out. Those that did not mostly took a minute or so. That suggests they go into a hang state due to a bug, they are not merely slow.

WorkItems
| where Status != "Timeout"
| where Started > todatetime("05/01/20")
| where FriendlyName == "System.Collections.Concurrent.Tests"
| extend Duration = datetime_diff("Minute", Finished, Started)
| summarize count() by Duration, length=bin(Duration, 1)
image

Randomly sampling the console output of the last ~20 timed out runs, suggests certain tests are usually the long running ones.

[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:14:01
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:14:01
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:14:01
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:14:03
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:14:03
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:14:03
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:14:06
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:14:07
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:14:08
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:14:08
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:14:09
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:14:10
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:14:11
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentBagTests.AddManyItems_ThenTakeOnSameThread_ItemsOutputInExpectedOrder', Elapsed: 00:03:34
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentBagTests.AddTakeWithAtLeastOneElementInCollection_IsEmpty_AlwaysFalse', Elapsed: 00:14:05
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentBagTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:04
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentBagTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:08
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentBagTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:09
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentBagTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:09
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentBagTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:09
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentBagTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:09
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentBagTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:19
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentBagTests.GetEnumerator_ParallelInvocations_Succeed', Elapsed: 00:14:10
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentDictionaryTests.TestGetOrAdd', Elapsed: 00:14:00
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentDictionaryTests.TestGetOrAdd', Elapsed: 00:14:03
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentDictionaryTests.TestUpdate', Elapsed: 00:03:38
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.AddTakeWithAtLeastOneElementInCollection_IsEmpty_AlwaysFalse', Elapsed: 00:14:08
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.Concurrent_Enqueue_TryDequeue_AllItemsReceived', Elapsed: 00:14:03
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.Concurrent_Enqueue_TryDequeue_AllItemsReceived', Elapsed: 00:14:03
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:01
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:03
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:06
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:12
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.ToArray_ParallelInvocations_Succeed', Elapsed: 00:14:01
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.Add_TakeFromAnotherThread_ExpectedItemsTaken', Elapsed: 00:14:09
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.BlockingCollection_WrappingCollection_ExpectedElementsTransferred', Elapsed: 00:03:01
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:02
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:03
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:05
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:15
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:16
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.PushRange_Concurrent_ConsecutiveItemsInEachRange', Elapsed: 00:14:04
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.PushRange_Concurrent_ConsecutiveItemsInEachRange', Elapsed: 00:14:05
[Long Running Test] 'System.Collections.Concurrent.Tests.PartitionerStaticTests.TestLoadBalanceIList', Elapsed: 00:14:01
[Long Running Test] 'System.Collections.Concurrent.Tests.PartitionerStaticTests.TestLoadBalanceIList', Elapsed: 00:14:08

But, typically when a job does "hang", there is more than one "hanging" test, eg.,

   System.Collections.Concurrent.Tests: [Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:05
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:03
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentDictionaryTests.TestGetOrAdd', Elapsed: 00:14:03
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:14:03

   System.Collections.Concurrent.Tests: [Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.ToArray_ParallelInvocations_Succeed', Elapsed: 00:14:01
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentBagTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:09
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentStackTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:02
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:14:01

   System.Collections.Concurrent.Tests: [Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentBagTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:19
[Long Running Test] 'System.Collections.Concurrent.Tests.BlockingCollectionTests.TestBugFix544259', Elapsed: 00:14:01
[Long Running Test] 'System.Collections.Concurrent.Tests.ConcurrentQueueTests.GetEnumerator_Generic_ExpectedElementsYielded', Elapsed: 00:14:01
[Long Running Test] 'System.Collections.Concurrent.Tests.PartitionerStaticTests.TestLoadBalanceIList', Elapsed: 00:14:01

Which suggests possibly they are interfering with each other.

@danmoseley
Copy link
Member

danmoseley commented Aug 3, 2020

Hangs are disproportionately likely to be on Windows (but not exclusively on Windows)

WorkItems
| where Status != "Timeout"
| where Started > todatetime("05/01/20")
| where FriendlyName == "System.Collections.Concurrent.Tests"
| extend Duration = datetime_diff("Minute", Finished, Started)
| summarize count() by QueueName
| top 3 by count_;

QueueName count_
ubuntu.1604.amd64.open 41953
ubuntu.1804.armarch.open 18125
osx.1013.amd64.open 17931

WorkItems
| where Status == "Timeout"
| where Started > todatetime("05/01/20")
| where FriendlyName == "System.Collections.Concurrent.Tests"
| summarize count() by QueueName
| top 3 by count_;

QueueName count_
windows.10.amd64.server19h1.es.open 21
windows.81.amd64.open 17
redhat.7.amd64.open 9

@danmoseley
Copy link
Member

Next action seems either

  • try to repro locally although evidence suggests this would be unlikely. Still, could be worth looping over night with no timeout set to see whether we have a repro in the morning
  • wait for "dotnet test -blame" so we can get a hang dump. That is at least a month out.

@danmoseley
Copy link
Member

danmoseley commented Aug 11, 2020

Again in #39640

@danmoseley danmoseley modified the milestones: 5.0.0, 6.0.0, 5.0.0 Preview 8 Aug 11, 2020
@danmoseley
Copy link
Member

I was inclined to move this out, but there is a possibility this is a product bug, so it woudl be good to at least attempt to loop it locally.

@layomia layomia self-assigned this Aug 26, 2020
@marek-safar marek-safar modified the milestones: 5.0.0, 6.0.0 Sep 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Collections runtime-mono specific to the Mono runtime
Projects
None yet
Development

No branches or pull requests