-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(plugin-essentials): yarn dedupe #1558
Conversation
One question: I thought that it's yarn's v1 bug and it should deduplicate deps by default. Am I missing something? |
@7rulnik Yarn doesn't deduplicate dependencies by default, otherwise installs wouldn't be deterministic and the lockfile would be useless. Example: Lockfile: foo@^2.3.4:
resolution: foo@2.3.4 Now if I run The problem (not actually a problem, this is how it's intended to be) is that Yarn doesn't unlock dependencies that are already locked in the lockfile. This means that with the following lockfile: foo@^2.3.4:
resolution: foo@2.3.4 If I run foo@^2.3.4:
resolution: foo@2.3.4
foo@2.10.14:
resolution: foo@2.10.14 This lockfile has 2 separate resolutions for the 2 "foo@^2.3.4, foo@2.10.14":
resolution: foo@2.10.14 This is, essentially, what the |
Got it, thanks! I thought that it's a bug because yarn v1 did not have a deduplicate command. I think we should add your explanation to docs. |
Great to see some improving on the initial work! So the only problem with https://github.com/eps1lon/yarn-plugin-deduplicate was that it is slow? I started working on it again after the 2.1 announcement and it should be working with 2.1 now after yarnplugins/yarn-plugin-deduplicate#6 is merged. |
@eps1lon I'm sorry for how I worded that, it's not that it was slow. I wanted to open a PR in this repo that deduplicates some dependencies (because they started to add up in size) - #1556 - and I tried using your plugin by copying your implementation inside a new command in our repo (because we have some limitations that prevent us from installing third-party plugins in this repo), just to use it for deduplication purposes. Unfortunately, when I tested it, I first encountered a bunch of I realize that we are now in a really unfortunate situation (that's mostly my fault), now that you've decided to revive Again, I'm sorry for how this played out. I'm open to suggestions about what we should with this, as long as they benefit the community. What do you think about this? How should we proceed? |
Should be fixed now if you install from
I'd prefer that as well so that we can test integration.
All good. Having this live in an official repo was always my preference. It being slow wouldn't have bothered me. It wasn't intended to be fast. Build something working first and then make it fast! |
Will it support the strategy flag and selective dedup that yarn-deduplicate supports? |
@Bnaya It already supports selective dedupe (via glob pattern rest args). I was initially thinking of leaving strategies out of the first iteration, but now that there's demand I will implement it in this PR. Note: I'll probably get back to this PR in a few days / next week. |
So... the PR is now finished. I've decided to leave out the More details about the
Now I'm going to summon @arcanis to review this PR and hopefully |
Great work! I'll look in details tomorrow, but I have some questions based on your abstract:
Why do we need to make network requests when doing the
I'm not familiar with PEP440 - how does it support non-semver ranges (in particular with the highest strategy, which implies sorting the versions as much as possible).
I'm not necessarily against using |
Nope, it doesn't update the packages, it just uses whatever is the highest in the lockfile. The problem is that there's no clear definition of "highest". For npm semver ranges, the highest is the highest version that satisfies the range, but what is the highest for npm tags? Let's take the following example: Lockfile: foo@npm:next:
resolution: foo@npm:1.0.0
foo@npm:^1.1.0:
resolution: foo@npm:1.1.0 Let's now say that the There's also the problem of non-semver ranges, which we wouldn't be able to deduplicate unless we use the resolver, which already has a definition of "highest" - the first candidate.
🤔 It makes sense for |
So, after more discussions, we've decided to drop both the network access requirement and the reliance on The remaining problems (?):
Unless there are any design problems remaining, I think this PR can be reviewed and I will do my best to address any remaining small problems so that we can hopefully ship this in |
Co-authored-by: Maël Nison <nison.mael@gmail.com>
Co-authored-by: Maël Nison <nison.mael@gmail.com>
Co-authored-by: Kristoffer K. <merceyz@users.noreply.github.com>
Co-authored-by: Kristoffer K. <merceyz@users.noreply.github.com>
What's the problem this PR addresses?
To quote from the command details:
How did you fix it?
This PR adds a new
yarn deduplicate
command that does what its description says.It's inspired by (but the implementation is different):
The current algorithm currently takes 30 seconds to run on our repo (after a lot of optimizations, it was 10 minutes before). I couldn't think of a better one that's also fast and produces results while resolving descriptors (so that the output is populated progressively while the deduplication runs; it would look weird if it took 30 seconds to resolve everything and then we would show all 200 results at once).
The current algorithm prefers reusing packages with the highest usage count, rather than with the highest version, which means that packages can be both upgraded and downgraded.After discussing with the rest of the team, we came to the conclusion that the best strategy would be to sort the locators by version, rather than by usage count. This means that packages can only be upgraded, never downgraded.
Note: I haven't added tests yet, but I'm working on it.
Checklist