Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is not actually meant to be merged... I just wanted to put up a draft of a potential refactoring of the analysis module that I have been thinking about.
The idea here is to encapsulate different per component analysis functions as classes, and allow developers to implement new ones by implementing the
PerComponentAnalysis
. Then you would only have to deal with the actual analysis code, not with any of the surrounding things like creating the graph, setting up multiprocessing etc.Other than making it easier to add new methods my initial benchmarks on this demonstrates that this is roughly ~6x faster than the current implementation. This is archived by to main changes. Firstly, the
Graph
object is only created once, and then passed to the different analysis steps. Secondly, it puts these methods on different processes.The main drawback of this approach as I see it is that
Graph
instance creation is not parallelized, and it of course also has the overhead of having to pickle/unpickel the data as it is sent back and forth to the subprocesses.I'd be happy to hear your thoughts about this approach and if you have any suggestions for improvements.