-
-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changed the comparators for Resource, Capability, and Requirement #5807
Changed the comparators for Resource, Capability, and Requirement #5807
Conversation
--- Signed-off-by: Peter Kriens <Peter.Kriens@aQute.biz> Signed-off-by: Peter Kriens <Peter.Kriens@aQute.biz>
Fixes #5793 |
@mnlipp could you take a look at these changes? |
--- Signed-off-by: Peter Kriens <Peter.Kriens@aQute.biz> Signed-off-by: Peter Kriens <Peter.Kriens@aQute.biz>
--- Signed-off-by: Peter Kriens <Peter.Kriens@aQute.biz> Signed-off-by: Peter Kriens <Peter.Kriens@aQute.biz>
I will. Just have to solve some other problem first in order to test it in my environment. |
In order to test your version, I've upgraded everything in my environment to 7.0.0. Nice effect: resolution works again without problems (for whatever reason; I assume that there is no difference between the 6.4.x comparator and the 7.0.0 comparator). Then I've build the p2 repository from your PR and installed it into eclipse. Now the resolution fails (doesn't terminate). I've appended the last entries from the log. I won't claim that I understand this. Failure is somehow always related to the felix webconsole, no matter how long I wait. I've tried putting some things on the blacklist that are not part of my previous (7.0.0) successful solution (e.g.
|
This sounds a bit like
Sorry, I cannot be more helpful here. |
Thanks. Eventually, I'll have to blacklist until things work, done that before. In this special context, however, the question is whether this PR "breaks" something. I would have expected that modifying the comparator has effects on the time required to find the solution. I wouldn't have been surprised if the solution found was slightly different But I find it surprising that the "known solution" isn't found any more. |
Log file attached. (Project is https://github.com/mnlipp/jgrapes-osgi/commits/master (968cf2d6640e7b923fc139258c91a61582cdda1f), resolving |
Hmm. These are you providers for the javax.servlet.http package: Http servlet might not be the absolute worst API in Java, from an OSGi point of view developers have tried since the heydays of EE server to give OSGi a really hard time making every bad semantic versioning and API decision that mankind could possibly come up with. You're torturing the resolver beyond my imagination :-)
Eighteen choices! And then I am not counting that all those JARs lie through their teeth that they support multiple versions for these packages since semantic versioning was not on the radar in the EE world. So the resolver probably sees 40 or 50 candidates. That is really not what it was designed for. I am really happy that you made me cleanup the comparators, they were really awfully bad. However, one of the wisest lessons of Elon Musk is not to optimize a solution, but try to get rid of the problem altogether. It has been my motto all my life, a better design is when it has fewer potential error cases. I think we need to figure out why the solution does not resolve on Apache/Eclipse and I've asked Eclipse's @tjwatson to look over our shoulders. From a bndtools point of view I think the solution for you is trivial, you have to clean up your repositories. The problem you run into is closely associated with #4172, you're having WAY too many options for the resolver. Also in bndtools, the resolution took > 20 seconds on an M2 chip. I haven't got a single bndrun file that takes more than 3 secs to resolve. By having this many options for the servlet (and associated) packages you are begging on your knees for resolver issues. I really appreciate helping to make the resolver better and the work you did. Very welcome. But I suggest we move the log and a commit record of your project and attach it to #4172 because I am pretty sure you've run into this bug. On your site, get rid of all but one servlet API JAR and your resolve will be sub second and more reliable ... Imho, keeping the repository's small saves a tremendous amount of issues, a lot of complicated things tend too get trivial. What do you think? |
Can you describe what you are fixing here. It says fixing #5807, but that is this PR. There were previous issues in BND where the comparator for BND resources was clearly wrong and that caused issues with BND and the resolver. But I find it hard to tell (only looking for a little bit) what this PR is fixing or the behavior it changed. |
@tjwatson sorry. the bug is #5793 but I don't think you're that concerned about it. I've changed the comparators in bnd since they were terrible and did not give consistent ordering. However, @mnlipp, who reported a sorting error due to the instability of the ordering, now has a resolution that succeeds on bndtools but fails on Eclipse. I was just asking if you could take a look at the log output in this bug and see if you maybe recognize the issue, the resolver loops forever. In the mean time I found out the project had 18 (!) jars with the servlet packages in myriad versions. It looks to me like the Apache Felix #4172 (comment) problem? |
Two things: (1) As I don't have a release manager for my OSS projects, I rely on the conventional strategy of trying to use the latest version of each bundle. (These may introduce new bugs but they should fix known security issues.) In order to automate this, my repository plugin scans the maven repositories starting with some explicitly given groups and lower bounds for artifacts and generates an index of these and all their dependencies (yes, the resulting index/repository is too big). All providers for the javax.servlet.http package that are in the "IndexedMaven" repository are actually dependencies of some other bundles, none was included explicitly. From time to time, I raise the lower bounds for the artifacts to reduce the number of bundles in the (generated) repository, but specific selections for my application (when required) are mostly managed by blacklisting bundles. This might push the resolver to its limits, but I've always thought that this is what it was made for. Yes, it takes some time, but I know that the underlying problem is difficult and, hey, how often do you have the opportunity to really see your hardware investment paying off. Unless this PR introduces some "fatal" issue, I will eventually get the resolution working again by adjusting my repository. And I understand that I should try to do this (and will within the next days) in order to provide additional input for evaluating this PR. (2) I still consider it strange that this PR breaks the resolution. I only had a superficial look at the code, and from this it looks like a rewrite. So, complicated as my resolution scenario may be, I wonder if it happens to be the missing (unit)test that reveals a regression bug introduced by this PR. And if this is the case, the PR might eventually not only break my resolving but also other's. If I was responsible for QA, I'd at least insist on some more non-trivial scenarios to be tested with this PR in place and check if they still resolve. |
I think I misunderstood the ask of me. So this is an issue trying to install the latest local build of the development branch+PR changes into an Eclipse installation and the resolution error is coming from that? Or is this error happening from some resolution driving from BND which is not using this PR? |
@tjwatson sorry for not being more clear, my brain turning mush :-(
@mnlipp This is the story, isn't it? The thing is, I've changed the comparators of Resource, Capability, and Requirement so it is kind of a conincedence that it now fails. However, there are 18 Jars with the servlet packages which each have numerous versions so it looks like a prime candidate for the resolver bug. I just wanted you to take a quick look if the log looked familiar with the resolver bug where a resolution never ends. |
@mnlipp I just believe in a very clean work floor to keep productivity up. That said, in maven they at least pick the first encountered version of an artifact. You seem to include multiple versions of the same artifact. This would already solve this problem I think.
I do not think this PR breaks the resolution in any way. Let's assume the bundles we give Eclipse are not a valid solution. The resolver in Eclipse should then tell is why it is bad. There is just no input we could possible give to Eclipse that would warrant not resolving. I.e this by definition is a bug in Eclipse/Apache resolver setup. No input should be able to create this behavior. I always hate these kind of 'coincidences' since they are extremely rare in our world. When two things happen at the same time they are almost always related. However, in this case Eclipse/Apache is by definition wrong by not resolving or claiming there is no resolution. Those are the only two valid outcomes of the API. Without this resolve bug out of the way, there is nothing we can do. I am also pretty confident that if bndtools found a solution, there is is a solution. I've not printed out the wiring yet to inspect it by hand. Kept back by my PTSD of servlets, it feels like having to dive in a cave filled with snakes :-( |
So @mnlipp, what do we do? I am inclined to submit this PR and see what happens? @bndtools/maintainers any feedback to not merge this? |
I am, just now, trying to get the resolution working again (sorry, didn't have time to do it earlier). I've reduced the number of servlet apis. Now (2 minutes ago) I get
I've never seen this before and I'm currently trying to understand it (log4j-api with version 2.21.0 is in my repository). I don't know yet whether this is related to this PR and wasn't "visible" before because of the never ending resolution or whether this is a completely independent issue. It's up to you to wait some more hours and see if something else shows up in my "special environment" or submit the PR now... |
New start. Went back to bnd-7.0.0. Excluded some of the javax.servlet bundles. Speeds up resolving tremendously (without changing the result). Installed this PR version. Resolving works. Regrettably, it takes more time, but I won't complain about that, I know that I'm "torturing the resolver". There is one difference in the result. This (PR) version adds (Looks like not adding it was a bug in the solution found by bnd-7.0.0, though the lack of this bundle has never caused any problems.) To summarize: time required to resolve has changed, result has slightly changed. This is what I'd expect from such a modification. |
so to make you co-responsible when it fails :-), shall we merge? I think this code is a lot better than the previous, it had a lot of known errors. I've been extremely careful to check for potential NPE's. |
Well, go ahead. |
I haven't had time to dig into this in details. But it does sound like the resolver is simply overwhelmed trying to solve the NP-hard problem with the number of possible choices. There are two things that usually help when installing such atrocities.
|
It would be interesting to see if the increased time to resolve is because of the different path that the resolver took to find a solution (ie, more iterations) or if because the comparator itself was slower (similar number of iterations but slower). The fact that it added an extra bundle suggests the former, but perhaps this is something we should try and keep an eye on during testing. |
This is mostly in the resolve context where we use it to order. It would not surprise me if we sort more often than necessary. Looking at what I know about the resolve process, would surprise me if the comparator would make a visible difference. |
Please note that I didn't systematically measure the time! My remark was based on an observation made when switching from one version to the other, which implied a restart of Eclipse. Although I am quite sure that I also had (re-)started Eclipse before the switch to verify the "old" resolution, all kinds of causes may have had an effect on that single observation. Especially that fact that the result of the old (re-tested) resolve process was what was already given while this PR version found a slightly different result may have had a significant effect. Better measurements would be required before pondering too much about this. |
I agree... I don't think it's worth the effort to benchmark formally/accurately. But I do think that it is worth us as alpha testers keeping an eye on it in case any significant performance changes are noted by any of us - that might warrant closer investigation. |
Perhaps, but this is largely a concern of the Felix resolver implementation and its use in Equinox vs. anything BND tools can do. |
I may have something related to this issue here. I don't know if it is problematic or not. I just noticed a difference between 7.0.0 and 7.1 in the resolver result: Requirement was:
7.1. (master) picked:
Both bundles provide the same package version. Question: |
wow, this is shaping up to become a nice tool! I think is ok. There are a lot of rewrapped jars out there. The resolver will pick them more or less randomly. As long as the contracts are fulfilled this is ok. |
Signed-off-by: Peter Kriens Peter.Kriens@aQute.biz