Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce number of details shown for cycle violations #326

Merged
merged 6 commits into from
Mar 15, 2020

Conversation

codecholeric
Copy link
Collaborator

@codecholeric codecholeric commented Feb 29, 2020

So far in highly coupled legacy code bases there was no way to reduce the amount of details when evaluating

slices().matching("com.myapp.root.(*)..").should().beFreeOfCycles()

This lead to the only indicator of "you really have a cycle problem" being that the test does not terminate or causes a StackOverFlowError because the amount of violations to print overloads the configured heap (assume 100 chars * 250 dependencies * 10 edges * 5000 cycles >= 1.25GB heap with an overly optimistic 1B per char).

This PR introduces default limits (max 100 cycles to detect, max 20 dependencies per edge to report) and makes these limits configurable via archunit.properties in case they don't match users' needs.

@ghost
Copy link

ghost commented Feb 29, 2020

DeepCode's analysis on #0260b5 found:

  • 0 critical issues. ⚠️ 0 warnings and 5 minor issues. ✔️ 0 issues were fixed.

💬 This comment has been generated by the DeepCode bot, installed by the owner of the repository. The DeepCode bot protects your repository by detecting and commenting on security vulnerabilities or other critical issues.


☺️ If you want to provide feedback on our bot, here is how to contact us.

@codecholeric codecholeric force-pushed the less-cycle-details branch 2 times, most recently from 975c9ae to 72dc300 Compare February 29, 2020 15:12
Since having an `archunit_ignore_patterns.txt` should not be the most common case (in particular since there is `FreezingArchRule` now) we should avoid filtering all messages if there are no ignore patterns configured. If the patterns are empty, the predicate will select all violations anyway.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
In general it does not make sense to report any number of cycles, since large numbers will simply result in `OutOfMemoryErrors`, because of all the textual violations occupying the heap. Thus we limit the number of cycles to detect to 100 by default and make this value adjustable via `archunit.properties`. This way users wanting to see more cycles can increase this limit and users with heap problems can further reduce it. If we limit the number of dependencies reported by slice in the next step we can make sure that the heap will by default not have to grow in two digit GB sizes.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
Copy link
Collaborator

@torfmaster torfmaster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me in principle. I left some comments on the code.


import static com.tngtech.archunit.library.dependencies.EdgeTest.newEdge;

public class RealLifeGraph {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the edge cases covered by this graph should be described.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I would know the exact edge case, then yes 😉 But unfortunately this graph is super complex and I don't know exactly why all our other graphs did not uncover that unblocking issue I've introduced. I just knew it would probably cost me a couple of hours to find out exactly the constellation causing the issue and I did not want to spend that time.
Or do you have an idea what the problem could be?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no obvious reason why this should fail (but I don't know your erroneous refactoring either). I would propose leaving this as it is. Maybe you could describe the error you introduced during refactoring, but this makes only sense, if this can be explained in two or three sentences.

docs/userguide/008_The_Library_API.adoc Outdated Show resolved Hide resolved
If we add the dependencies immediately to a `SortedSet` and later on propagate the order through `Edge`, we do not need to copy and sort the dependencies again.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
@codecholeric codecholeric force-pushed the less-cycle-details branch 2 times, most recently from 16a4dda to efce7e2 Compare March 6, 2020 17:13
Typically if a cycle between packages occurs, there are some dependencies that were planned and some that were unplanned, where there are a lot of dependencies in the planned direction, but a way less number of dependencies in the wrong direction. We limit the number of reported dependencies per edge by default to 20 and make this limit configurable in case it does not fit for users. That way we can limit the amount of heap necessary for scenarios where there are a huge amount of cycles.

Assume we have no limits, assume further there are 5000 cycles between the analyzed packages, a good number consisting of around 20 edges. Assume this code base is strongly coupled, such that at least some edges contain thousands of dependencies. Say we have on average 10 edges per cycle and on average 250 dependencies per edge. Assume a typical dependency description contains 100 characters. Even if we assume 1 byte per character, we end up with 100 * 250 * 10 * 5000 = 1.25GB of heap necessary once we create the description. This is neither reasonable nor feasible. The new defaults will at least limit this to 2 * 100 * 20 * 10 * 100 = 4MB by default if we simply assume an upper bound of 2 bytes per char (just a rough estimate). It can still be adjusted, but the danger of ArchUnit crashing with an `OutOfMemoryError` in strongly coupled and cyclic code bases is now way less.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
Added a section `Configurations` to the user guide section about the `SliceRuleDefinition`. Together with the log statements and hints in the reported violations, this should be sufficient to guide users to those parameters.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
@codecholeric codecholeric merged commit a0976ec into master Mar 15, 2020
@codecholeric codecholeric deleted the less-cycle-details branch March 15, 2020 21:58
codecholeric added a commit that referenced this pull request Feb 21, 2021
So far in highly coupled legacy code bases there was no way to reduce the amount of details when evaluating

```
slices().matching("com.myapp.root.(*)..").should().beFreeOfCycles()
```

This lead to the only indicator of "you really have a cycle problem" being that the test does not terminate or causes a `StackOverFlowError` because the amount of violations to print overloads the configured heap (assume 100 chars * 250 dependencies * 10 edges * 5000 cycles >= 1.25GB heap with an overly optimistic 1B per char).

This PR introduces default limits (max 100 cycles to detect, max 20 dependencies per edge to report) and makes these limits configurable via `archunit.properties` in case they don't match users' needs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants