Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolver runs in circles till it eats all the memory #4172

Closed
juergen-albert opened this issue Jun 18, 2020 · 14 comments
Closed

Resolver runs in circles till it eats all the memory #4172

juergen-albert opened this issue Jun 18, 2020 · 14 comments
Assignees
Labels
abeyance need of contributor [requires is:closed] stale Stale issue or pull request

Comments

@juergen-albert
Copy link
Contributor

Taking the discussion from #4154 to a new Bug, so we can look at this independently. This may end up becoming a bug for the resolver, but we can do the analysis first.

The current assumption is, that the resolver tears itself apart, when
a. it has two many candidates for certain requirements
b. the Repositories have Artifacts with "unclean" Metadata (like Require Bundle)

An example of a failing resolve is: https://gitlab.com/gecko.io/geckoEMF-Tooling/-/tree/failing_resolve

Here the resolve dies for https://gitlab.com/gecko.io/geckoEMF-Tooling/-/blob/failing_resolve/org.gecko.emf.osgi.codegen/codegen.bndrun if we remove the runblacklist the resolve works just fine.

@juergen-albert
Copy link
Contributor Author

After comparing a few resolver logs I may have found some clues.

The resolver stumbles seems to run circles with requirements for the package org.osgi.util.function and org.osgi.util.promise because they are included and exported in a lot of bundles.

I've seen thousends of entries with:

DEBUG: Candidate permutation failed due to a conflict between imports; will try another if possible. (Uses constraint violation. Unable to resolve resource org.apache.felix.scr [org.apache.felix.scr version=2.1.16.v20200110-1820] because it is exposed to package 'org.osgi.util.promise' from resources org.eclipse.osgi.util [org.eclipse.osgi.util version=3.5.300.v20190708-1141] and org.eclipse.osgi.util [org.eclipse.osgi.util version=3.5.300.v20190708-1141] via two dependency chains.

Chain 1:
  org.apache.felix.scr [org.apache.felix.scr version=2.1.16.v20200110-1820]
    import: (&(osgi.wiring.package=org.osgi.util.promise)(version>=1.0.0)(!(version>=2.0.0)))
     |
    export: osgi.wiring.package: org.osgi.util.promise
  org.eclipse.osgi.util [org.eclipse.osgi.util version=3.5.300.v20190708-1141]

Chain 2:
  org.apache.felix.scr [org.apache.felix.scr version=2.1.16.v20200110-1820]
    import: (&(osgi.wiring.package=org.osgi.service.component.runtime)(version>=1.4.0)(!(version>=1.5.0)))
     |
    export: osgi.wiring.package=org.osgi.service.component.runtime; uses:=org.osgi.util.promise
  org.apache.felix.scr [org.apache.felix.scr version=2.1.14]
    import: (&(osgi.wiring.package=org.osgi.util.promise)(version>=1.0.0)(!(version>=2.0.0)))
     |
    export: osgi.wiring.package: org.osgi.util.promise
  org.eclipse.osgi.util [org.eclipse.osgi.util version=3.5.300.v20190708-1141])

@juergen-albert
Copy link
Contributor Author

I've got my issues resolved, by systematically blacklisting bundles that cause the errors from above. Right now, the log is a bit hidden and it is quite cumbersome to find the offending bundles in the mass of log entries.

At the moment the wizard comes up, when the resolve job finished with any kind of result. I'd like to change this and want to get the wizard up, while the job is running, With this, we can show such messages directly to the user. If I can get the information via the ResolutionCallback as well, I can even list the offending candidates with an option to blacklist them and rerun the resolve process before it gets out of hand. .

@juergen-albert
Copy link
Contributor Author

@bjhargrave could you assign the issue to me?

@pkriens
Copy link
Member

pkriens commented Jun 19, 2020

I think the approach should be to sort the candidates we provide from the context to the resolve. Could we come up with some heuristic that puts the official bundle at the front and the alternatives at the back?

If you're not extremely careful, blacklisting quickly make the resolver deteriorate into assembling a plain old -runbundles :-( Just way slower.

@juergen-albert
Copy link
Contributor Author

@pkriens I would prefere a automated solution as well, but giving additional statistics, a human can read and understand would help as well. If we provide such information, we can aether solve it via deny listing the bundles or even better throw out the offending dependencies at all. In my cases they have been there by accident anyway.

@juergen-albert
Copy link
Contributor Author

I plan setting my resident astro physics phd up, so she can analyse the issue. She is the right person to come up with heuristics for this.

@kriegfrj
Copy link
Contributor

At the moment the wizard comes up, when the resolve job finished with any kind of result. I'd like to change this and want to get the wizard up, while the job is running, With this, we can show such messages directly to the user. If I can get the information via the ResolutionCallback as well, I can even list the offending candidates with an option to blacklist them and rerun the resolve process before it gets out of hand. .

👍 I'd like to add to this wish list:

  • I like the idea of having the option to have the wizard up while resolution is in progress. However, I think it would still be nice to allow it to run in the background.
  • When running in the background, you could still get useful feedback by updating the message displayed in the progress monitor. This could be done in the ResolutionCallback. That way, if the resolve is running in the background you can see that it's doing something and hasn't simply hung. You can also get an idea if it's stuck in a circle if you see the same bundles keep coming back for consideration.
  • The task progress monitor should also give some indication of percentage complete. This is difficult to do accurately because of the nature of the problem, with more work being discovered as you go.; however there are some tips in the Eclipse community for how to give some meaningful progress updates in such cases - I think that 3.5 Queues in this article is relevant to our case.

@pkriens
Copy link
Member

pkriens commented Jun 22, 2020

If we make the window non-modal we could even start multiple resolutions.

@bjhargrave bjhargrave added the abeyance need of contributor [requires is:closed] label Jul 10, 2020
@kriegfrj
Copy link
Contributor

Just adding a couple more data points to this:

Like @juergen-albert suggested, I think this issue occurs when you have multiple bundles exporting the same or nearly the same set of bundles.

One clear case where I keep stumbling into this is when trying to resolve a runtime for iDempiere. There are at least two different pairs of culprits:

  1. The manually-resolved set of runbundles for iDempiere contains com.sun.jakarta.mail:1.6.3 and jakarta.mail.api:1.6.3. These bundles contain a different set of packages and are both seemingly required for iDempiere, but both also export the packages for javax.mail.*. The resolver log indicates that a lot of time is being spent trying each of them as candidates to link against other bundles that import these packages.
  2. The runbundles also contains org.apache.commons.logging and org.springframework.spring-jcl, which similarly have a different set of exported package but with a significant intersection between the two.

The interesting thing is that although the resolver takes a long time to resolve the full set of runbundles (I've actually never seen it finish), when you start the framework with the manually-entered list of runbundles it starts normally & quickly.

@juergen-albert
Copy link
Contributor Author

Maybe this is related: https://issues.apache.org/jira/browse/FELIX-6358

Short summary: The same bundle from two repositories (in the given case with different jar names) caused the resolver to run in circles. If the same thing happens all the time 2 identical bundles from different repositories are around we could have another candidate.

BTW: Usually I try to solve this issue by removing one of the candiates from the repositories or if this is not possible, by blacklisting one in the bndrun. In my case one came from p2 and one as a transitive dependency of a BOM so black listing is the only option. I tried to put the GAV in the bnd.identity but that did not work. BSN and Version is no option, because then I would blacklist both bundles. Any idea?

@bjhargrave
Copy link
Member

bjhargrave commented Nov 11, 2020

Short summary: The same bundle from two repositories (in the given case with different jar names) caused the resolver to run in circles. If the same thing happens all the time 2 identical bundles from different repositories are around we could have another candidate.

Strangely, I am debugging a fix for this problem right now! The specific issue I am working in is the same exact file visible from multiple repositories. That is, the Resource objects are equals. If the same bundle is in different files, then there Resource objects are not equals.

This Resource is equal to another Resource if both have the same content and come from the same location. Location may be defined as the bundle location if the resource is an installed bundle or the repository location if the resource is in a repository.

@bjhargrave
Copy link
Member

See #4409 for a resolve context fix about duplicate capabilities. This will help when the resources are equal such an index of .m2 and a maven ImplicitFileSetRepository.

@stale
Copy link

stale bot commented Nov 13, 2021

This issue has been automatically marked as stale because it has not had recent activity. Given the limited bandwidth of the team, it will be automatically closed if no further activity occurs. If you feel this is something you could contribute, please have a look at our Contributor Guide. Thank you for your contribution.

@stale stale bot added the stale Stale issue or pull request label Nov 13, 2021
@stale
Copy link

stale bot commented Dec 4, 2021

This issue has been automatically closed due to inactivity. If you can reproduce this on a recent version of Bnd/Bndtools or if you have a good use case for this feature, please feel free to reopen the issue with steps to reproduce, a quick explanation of your use case or a high-quality pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
abeyance need of contributor [requires is:closed] stale Stale issue or pull request
Projects
None yet
Development

No branches or pull requests

4 participants