Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 'InvalidOperationException' with message Collection was modified; enumeration operation may not execute in MultithreadedAnalyzeCommandBase, which is raised when analyzing with the --hashes switch. #2447

Merged
merged 11 commits into from
Feb 14, 2022

Conversation

eddynaka
Copy link
Collaborator

@eddynaka eddynaka commented Feb 11, 2022

Description

BinSkim uses the MulthtireadedAnalyzeCommandBase and it was throwing InvalidOperationException with the following stack:

System.InvalidOperationException with message "Collection was modified; enumeration operation may not execute."
         at System.Collections.Generic.Dictionary`2.Enumerator.MoveNext()
         at Microsoft.CodeAnalysis.Sarif.Driver.MultithreadedAnalyzeCommandBase`2.LogCachingLogger(TContext rootContext, IAnalysisLogger logger, TContext context, Boolean clone) in C:\Microsoft\sarif-sdk\src\Sarif.Driver\Sdk\MultithreadedAnalyzeCommandBase.cs:line 311
         at Microsoft.CodeAnalysis.Sarif.Driver.MultithreadedAnalyzeCommandBase`2.WriteResultsAsync(TContext rootContext) in C:\Microsoft\sarif-sdk\src\Sarif.Driver\Sdk\MultithreadedAnalyzeCommandBase.cs:line 272

This happens when multiple files are being analyzed and they share the same context (the files have the same hash). The first thread finishes and starts to write, and the second thread is still writing in the same context, generating the exception.

The following issues were created:

@michaelcfanning
Copy link
Member

If you pursue this change, please be sure to explore a test approach for the problem. Embedding a stress harness at unit-test time isn't a very reliable approach and can slow down testing. One idea is to either test or build into the product itself some enforcement of whatever invariant is being broken here.

What I think is happening based on your analysis is that someone is writing to a caching logger after another thread thinks an analysis is finished and it can retrieve the results list. And so, what we could try is to alter the caching logger to put some sort of block between these phases. Think about it. :) Remember, we have the analysis finished event that should tell us when we think we're done.

But the trickiness here is that our logger is handing back a results object that is a mutable IList instance. What if, after receiving the analysis completed event, we converted this data to an immutable list?

As you consider that approach, also consider other possible threading issues in this class. One trick here is to retain data that points to the single thread that we think has permission to mutate the results collection. And to raise an exception in cases where someone appears to be attempting a mutating operation on an incorrect thread. This won't help us fix anything in itself but it should help use raise failures much earlier in the process (even in cases where the broken thread access doesn't cause an apparent problem/exception today). This approach also makes our stress more efficient, again because it forces earlier manifestation of problems. All of this commentary, however, hinges on providing a different mutation/access strategy for the results collection. Today, we just hand off the list of results with no good sense of what happens next.

@@ -1370,7 +1370,7 @@ public void AnalyzeCommandBase_MultithreadedShouldUseCacheIfFilesAreTheSame()
generateSameOutput: false,
expectedResultCode: 0,
expectedResultCount: 7,
expectedCacheSize: 0);
expectedCacheSize: 20);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expectedCacheSize

Can you explain what you learned here?

@eddynaka eddynaka changed the title Simplifying multithreaded analysis when hashes are the same Fix InvalidOperationException when --hashes is enabled and using MultithreadedAnalyzeCommandBase. Feb 14, 2022
@eddynaka eddynaka marked this pull request as ready for review February 14, 2022 20:14
[Fact]
public void MultithreadedAnalyzeCommandBase_EndToEndMultithreadedAnalysis()
{
string specifier = "*.xyz";
Copy link
Member

@michaelcfanning michaelcfanning Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specifier

Always writeline the random seed (in a test that uses the random instance) on entrance of the test. See other tests for a nice reporting pattern. #Resolved

@@ -7,6 +7,7 @@
* BREAKING: Fix `InvalidOperationException` when using PropertiesDictionary in a multithreaded application, and remove `[Serializable]` from it. Now use of BinaryFormatter on it will result in `SerializationException`: Type `PropertiesDictionary` is not marked as serializable. [#2415](https://github.com/microsoft/sarif-sdk/pull/2415)
* BREAKING: `SarifLogger` now emits an artifacts table entry if `artifactLocation` is not null for tool configuration and tool execution notifications. [#2437](https://github.com/microsoft/sarif-sdk/pull/2437)
* BUGFIX: Fix `ArgumentException` when `--recurse` is enabled and two file target specifiers generates the same file path. [#2438](https://github.com/microsoft/sarif-sdk/pull/2438)
* BUGFIX: Fix `InvalidOperationException` when `--hashes` is enabled and using `MultithreadedAnalyzeCommandBase`. [#2447](https://github.com/microsoft/sarif-sdk/pull/2447)
Copy link
Member

@michaelcfanning michaelcfanning Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InvalidOperationException

Always report the exception type, the exception message and where the exception source.

Fix 'InvalidOperationException' with message Collection was modified; enumeration operation may not execute in MultithreadedAnalyzeCommandBase, which is raised when analyzing with the --hashes switch.

#Resolved

Copy link
Member

@michaelcfanning michaelcfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@eddynaka eddynaka enabled auto-merge (squash) February 14, 2022 20:37
@eddynaka eddynaka changed the title Fix InvalidOperationException when --hashes is enabled and using MultithreadedAnalyzeCommandBase. Fix 'InvalidOperationException' with message Collection was modified; enumeration operation may not execute in MultithreadedAnalyzeCommandBase, which is raised when analyzing with the --hashes switch. Feb 14, 2022
@eddynaka eddynaka mentioned this pull request Feb 14, 2022
@eddynaka eddynaka merged commit 0177c03 into main Feb 14, 2022
@eddynaka eddynaka deleted the users/ednakamu/fixing-multithreaded-issue branch February 14, 2022 20:45
This was referenced Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants