Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative registries #4506

Merged
merged 14 commits into from
Oct 25, 2017
Merged

Alternative registries #4506

merged 14 commits into from
Oct 25, 2017

Conversation

withoutboats
Copy link
Contributor

An implementation of alt registries. Still needs to get tested more thoroughly but seems to work.

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @matklad (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think this should be able to copy a lot of test tests/registry.rs implementation and perhaps parameterize some more of tests/cargotest/support/registry to get tests up and running as well?

@@ -87,6 +87,8 @@ enum Kind {
Registry,
/// represents a local filesystem-based registry
LocalRegistry,
/// represent an alternate registry
AlternateRegistry(String),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, could this be folded into the Registry type above? We could then either store Option<String> or I'd be fine having a global notion of "default registry" which is defaulted to something like crates-io

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It prints out a slightly different message, indicating that its an alternative registry (e.g. when you update the index). But that could also be covered by Option<String>.

@@ -223,9 +225,39 @@ impl SourceId {
Ok(SourceId::for_registry(&url))
}

pub fn alt_registry(config: &Config, key: &str) -> CargoResult<SourceId> {
let registries = config.get_table("registries")?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to various bugs in the implementation today, using get_table will end up bypassing the environment variable fallbacks for fetching keys. Perhaps this could attempt get_string directly (or a similar method) to ensure that config-via-env-var works?

match registries.as_ref().and_then(|registries| registries.val.get(key)) {
Some(registry) => {
let index = match *registry {
CV::String(ref s, _) => s,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is supporting both:

[registries]

foo = "https://path/to/index"

and

[registries.foo]
index = "https://path/to/index"

Should we perhaps canonicalize on just one? (or was it a nice-to-have to support both?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because its similar to how dependencies work (with the version key), in case we ever allow optional additional fields in registries

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed yeah, although that was primarily done for ergonomics (as you're basically always doing that). In this case though, maybe we could just canonicalize on the latter to start out with and extend it in the future if necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion. :)

Copy link
Member

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, should this also add some restrictions for cargo publish? We want to forbid dependencies from crates.io packages to packages from other registries :)

if let Kind::AlternateRegistry(ref key) = self.inner.kind {
format!("alternate registry `{}` ({})", self.url(), key)
} else {
format!("registry `{}`", self.url())
Copy link
Member

@matklad matklad Sep 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚲🏠 : name "alternate registry" does not sound like the best one to present to the user. I can imagine that in a corporate environment "alternative" registry is thought of as a primary one.

Perhaps the messages could look like this?

Updating crates.io registry
Updating registry https://my-company.com/cargo-registry

Hm, now that I've written it down, perhaps we even should use symbolic names for registries for user-facing messages, and not URLs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it seems reasonable to me to basically universally refer to registries through their "short identifier", e.g. crates-io or whatever's in .cargo/config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with identifying them using the symbolic name the user has given it, rather than the index URL.

Don't have a strong opinion about how/if to distinguish crates.io from other registries.

Copy link
Contributor Author

@withoutboats withoutboats Sep 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to with:

registry `crates-io`
registry `foobar` (https://github.com/myco/foobar-index)

That is, hardcode non alt registry sources to crates-io & show the URL of other registries as well as the key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, currently we say:

registry `https://github.com/rust-lang/crates.io-index

I think giving crates-io instead of the index URL is a nicer UX though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we show the registry URL because it could be a local path. Gonna keep showing it in all cases, but call the main registry crates-io.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to with:

Looks great!

I think giving crates-io instead of the index URL is a nicer UX though.

Yes!

Actually, we show the registry URL because it could be a local path. Gonna keep showing it in all cases, but call the main registry crates-io.

That may just be an artifact of the tests, but I'm fine either way

@carols10cents
Copy link
Member

@withoutboats i wanted to try this out but it's not building :( about to try and fix... https://travis-ci.org/rust-lang/cargo/jobs/277089689#L772

does this need to use the new cargo unstable features stuff? we said we would in the RFC....

@carols10cents
Copy link
Member

I think i fixed the build, we'll see :)

@withoutboats
Copy link
Contributor Author

@carols10cents thanks I did a really lazy rebase

@carols10cents
Copy link
Member

I'm able to use a crate from another registry!!!!!!!

Here's my alternate registry (and also crate source): https://gitlab.com/carols10cents/my-alt-registry
And here's a project using the alternate registry, including .cargo/config and Cargo.lock even though you're not supposed to check those in: https://gitlab.com/carols10cents/test-alt-registries

That project uses one crate from my alt registry and one crate from crates.io, that worked fine too :)

Here are some things I ran into:

  1. (p=low) The api key in the registry index's config.json must be present, when I didn't have it there, I got:
$ ~/rust/cargo/target/debug/cargo build
    Updating alternate registry `https://gitlab.com/carols10cents/my-alt-registry` (my-registry)
 Downloading add-one v0.1.0 (alternate registry https://gitlab.com/carols10cents/my-alt-registry (my-registry))
error: unable to get packages from source

Caused by:
  missing field `api` at line 3 column 1

I added "api": "" to config.json and then it was happy. I don't think we want to require an api; should we require that the key be present even if there isn't a value?

  1. (p=blocker) The first cargo build works great! However, once I have a Cargo.lock, I can't do anything else :( I get this error if I try to build twice:
error: failed to parse lock file at: /Users/carolnichols/rust/test-alt-registries/Cargo.lock

Caused by:
  unsupported source protocol: alternate-registry for key `package.source`
  1. (p=med) If I try to publish this crate to my locally running crates.io (I was pretty sure it'd be rejected but I didn't want to pollute production just in case), the publish does get rejected by cargo (and never hits crates.io) but the error message isn't great:
error: crates cannot be published to crates.io with dependencies sourced from a repository
either publish `rand` as its own crate on crates.io and specify a crates.io version as a dependency or pull it into this repository and specify it with a path and version
(crate `rand` has repository path `registry https://github.com/rust-lang/crates.io-index`)

rand isn't even the crate from my custom registry, I'm not sure what is getting confused here :-/ I think we should fix this error message before merging this PR.

@withoutboats
Copy link
Contributor Author

Just pushed up some commits:

  • Split SourceId into its own module (sorry this may make the diff harder to review).
  • Made the api key optional in the registry format.
  • Adjusted what we print to match what we discussed previously.
  • Fixed the lockfile issue @carols10cents identified (the problem stemmed from the fact that the key was not roundtripped to the lockfile, but was considered when comparing two SourceIds, so when you compared the source from the toml and the lockfile they weren't equal).

Still not addressed:

  • The bad error message on upload to crates.io.
  • Whether or not to support both key = index and key as object with index member syntaxes.

Also, this should probably be feature gated, right?

@@ -293,7 +275,7 @@ impl SourceId {

pub fn is_default_registry(&self) -> bool {
match self.inner.kind {
Kind::Registry => {}
Kind::Registry if self.inner.registry_key.is_none() => {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this clause be removed? It seems like it may cause different behavior between:

a = "0.1"
b = { version = "0.1", registry = "crates-io" }

kind: Kind,
// e.g. the exact git revision of the specified branch for a Git Source
precise: Option<String>,
registry_key: Option<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be best represented as a Registry(String) below in the Kind type, perhaps?

Copy link
Contributor Author

@withoutboats withoutboats Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it was originally represented and I agree its cleaner but it required adding a custom impl of Hash and PartialEq for Kind, whereas this structure can piggyback the existing work to make two git refs normalize.

Possibly I'm just piling on the technical debt here (i.e. we should store these git specific fields in the Kind also), and I could refactor the whole thing a bit instead of doing it more.

@@ -205,7 +205,7 @@ pub fn registry(config: &Config,
src.update().chain_err(|| {
format!("failed to update {}", sid)
})?;
(src.config()?).unwrap().api
(src.config()?).unwrap().api.unwrap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this avoid the unwrap and return an error instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its guaranteed not to fail (unless you've manually corrupted your index or something) because the src we're accessing is crates.io.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah true yeah, but this may help future-proof for custom registries?

let new_source_id = match (details.git.as_ref(), details.path.as_ref()) {
(Some(git), maybe_path) => {
let new_source_id = match (details.git.as_ref(), details.path.as_ref(), details.registry.as_ref()) {
(Some(git), maybe_path, _) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll want to match this here as well, and without backwards compatiblity concerns (like the path dependency) we should just return an error for both git = "foo", registry = "bar" in a dependency specification

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think this code is also the source of the bad error message @carols10cents mentioned above.

@@ -895,7 +896,7 @@ impl TomlDependency {
let loc = git.to_url()?;
SourceId::for_git(&loc, reference)?
},
(None, Some(path)) => {
(None, Some(path), _) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, I think we should return an error if the registry is specified but a path is also specified

@alexcrichton
Copy link
Member

Looking good!

Some other thoughts:

  • yeah let's feature gate this by default
  • I'm in favor of just requiring index = "url" syntax in the config
  • On second thought, maybe we shouldn't store the "short identifier" in SourceId at all? We certainly can't extract it from a lock file (right?) in which case I think it's only used for pretty printing and we can handle that by just passing in a Config whenever we pretty-print
  • Before landing we'll want tests exercising this locally as well

@withoutboats
Copy link
Contributor Author

On second thought, maybe we shouldn't store the "short identifier" in SourceId at all? We certainly can't extract it from a lock file (right?) in which case I think it's only used for pretty printing and we can handle that by just passing in a Config whenever we pretty-print

It is only used for pretty printing, and we can't extract it from the lock file (right now in the code I believe if we ever pretty print any source from the lockfile it will claim its from crates-io...). Just doing a reverse lookup from the config presents the problem where someone has created two names for the same URL, and we don't know which to print (that's definitely an edge case though).

The 'correct' solution would be to record the registry name for each package to the lockfile, but the lockfile's current syntax isn't really conducive to that.

@alexcrichton
Copy link
Member

Hm ok, then I think I'm in favor of "let's pass Config when we want a pretty print". I'm not too worried about two urls pointing to a name, we can presumably just pick one. It's sort of intentional that the lock file only encodes the URL as that really is the source of truth here, it may be the case that some machines call the same registry different names perhaps but we wouldn't want to generate multiple lockfiles for that.

@stephanbuys
Copy link

Any thoughts around the yank command? I can see it coming up as a natural next question the moment these changes go live, especially when running an internal crates.io server.
The work here will definitely help with setting up test crates.io servers. I wanted to run yank while trying to replicate rust-lang/crates.io#1038 on my local dev instance.

From a quick scan of the current functionality registries are used by login, publish, [build/run/fetch/...] and yank

@withoutboats
Copy link
Contributor Author

@stephanbuys This PR (and the connected RFC) doesn't include support for cargo publish either (or cargo login for that matter). For modifying a registry other than crates.io, cargo will not be the tool to do that in the immediate future, it will just be able to download packages from those registries.

There's a good chance that changes eventually of course but this is an incremental step, and publish/yank etc are out of scope for this PR.

@stephanbuys
Copy link

Ok - got it 😃In a sense publish and yank already support alternates through the publish --host and yank --index flags, I'm assuming that the next step would just be to "wire" those up to support the new .cargo/config and other Cargo.toml related changes introduced? (In a separate/follow-up PR of course)

@withoutboats
Copy link
Contributor Author

@stephanbuys In a sense, yes, that would work. But in connection with this PR, we're also documenting/standarding the index format. To provide the same level of support for publish/yank, we'd need to do the same for the crates.io API.

(Obviously if you run a clone of crates.io it would Just Work, but to support people using a different implementation for whatever reason we would need a spec.)

@carols10cents
Copy link
Member

Can confirm that the api key is no longer needed in config.json and I can build, test, etc with a Cargo.lock present :) 👍 💯

@bors
Copy link
Collaborator

bors commented Sep 23, 2017

☔ The latest upstream changes (presumably #4525) made this pull request unmergeable. Please resolve the merge conflicts.

@withoutboats
Copy link
Contributor Author

In the current state of the PR this issue cited by @carols10cents is the only outstanding issue AFAIK:

(p=med) If I try to publish this crate to my locally running crates.io (I was pretty sure it'd be rejected but I didn't want to pollute production just in case), the publish does get rejected by cargo (and never hits crates.io) but the error message isn't great:

Right now, I get that it couldn't find add-one (the crate in Carol's registry) in crates.io.

This seems like a hard problem to solve correctly because the fact that these come from alternative registries is not roundtripped to the lockfile. I believe an easy solution like gating on the URL not matching crates.io will break the behavior of people who depend on (pure) mirrors but publish to crates.io today.

I am inclined to change the lockfile format to track alt-registry dependencies differently from registry dependencies, but so far I've avoided doing that. Thoughts @alexcrichton?

@withoutboats
Copy link
Contributor Author

Alt registry crates with dependencies on crates.io are not working right now, still investigating. Once that works and there's a test for it I think ready to merge.

}
None => Err(format!("Required unknown registry source: `{}`", key).into())
}
if let Some(index) = config.get_string(&format!("registries.{}.index", key))? {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be simplifyable further with:

let index = match config.get_string() {
    Some(i) => i,
    None => bail!(...),
};
// ...

@alexcrichton
Copy link
Member

Once that works and there's a test for it I think ready to merge.

Sounds good to me!

Could you also be sure to add a test where a package in an alternate registry depends on another package in an alternate registry? (deps within the same registry)

@cswindle
Copy link
Contributor

@withoutboats, what is the current status of this PR? Is the issue that you hit above difficult to solve, or you just haven't had time to sort it? If there is anything that I can do to move the PR along then let me know.

@withoutboats
Copy link
Contributor Author

@cswindle Sorry for the delay - the issue with depending on crates.io from an alt registry is a bit tricky. Today, the resolver assumes that, for a crate in a registry, all of its dependencies will come from the same registry. How that assumption gets made is not super clear, and the resolver code is a bit complex, so I hadn't found it.

You're welcome to take a stab at it if you want! @alexcrichton thought he knew where the issue was, and we were going to pair on it while we were both in Montreal last week but didn't get a chance to. We'll be in Columbus together next week and probably pair on this to get it ready to merge then (Tues or Weds), unless you decide to find the bug first. :-)

@cswindle
Copy link
Contributor

@withoutboats, I took a look into the issue and found that it was caused by the fact that in the registry we do not store the index to use, hence it is always resolving the dependency using the current registry. You can find the code changes that I made to fix this here:

https://github.com/cswindle/cargo/tree/alt-regs-fixes

I have not yet run through the other tests, but the test for the specific scenario passes. Let me know if you have any queries about my changes.

@withoutboats
Copy link
Contributor Author

Thanks @cswindle! I'm going to be flying tomorrow (and preparing to fly today) but on Tuesday I'll try to merge your change in and see if we can get this landed. 🎉

@cswindle
Copy link
Contributor

I have now added the extra test that @alexcrichton asked for and have verified that all tests pass, so hopefully come Tuesday you will be able to get this in without needing to do too much.

Note that at the moment this does not include adding the registry when publishing, but I think that is fine as I can make those changes in #4568 as that relates to allowing publish.

@cswindle
Copy link
Contributor

@withoutboats, It looks like we both added the same tests so we should remove the duplication before merging.

@cswindle
Copy link
Contributor

@withoutboats, I have merged master into my branch and fixed up the tests.

@cswindle
Copy link
Contributor

Is that a genuine failure on the AppVeyor build, seems odd that it would be specific to the code changes.

@withoutboats, can you remove the macro_use in alt-registry.rs as I added it when I was playing around with fixing the build and it is unused and produces a warning.

@matklad
Copy link
Member

matklad commented Oct 25, 2017

Is that a genuine failure on the AppVeyor build, seems odd that it would be specific to the code changes.

Nope, see #4659

@alexcrichton
Copy link
Member

@bors: r+

Thanks so much @withoutboats and @cswindle! I think this should be able to land now w/ CI fixes and otherwise the code looks great!

@bors
Copy link
Collaborator

bors commented Oct 25, 2017

📌 Commit 2445497 has been approved by alexcrichton

@bors
Copy link
Collaborator

bors commented Oct 25, 2017

⌛ Testing commit 2445497 with merge e3b0cf6...

bors added a commit that referenced this pull request Oct 25, 2017
Alternative registries

An implementation of alt registries. Still needs to get tested more thoroughly but seems to work.
@bors
Copy link
Collaborator

bors commented Oct 25, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing e3b0cf6 to master...

@bors bors merged commit 2445497 into rust-lang:master Oct 25, 2017
bors added a commit that referenced this pull request Nov 18, 2017
Add support for publish to optionally take the index that can be used

This form part of alternative-registries RFC-2141, it allows crates to optionally specify which registries the crate can be be published to.

@carols10cents, one thing that I am unsure about is if there is a plan for publish to still provide index, or for registry to be provided instead. I thought that your general view was that we should move away from the index file. If we do need to map allowed registries to the index then there will be a small amount of extra work required once #4506 is merged.

@withoutboats, happy for this to be merged into your branch if you want, the main reason I did not base it on your branch was due to tests not working on there yet.
@ehuss ehuss added this to the 1.23.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-registries Area: registries relnotes Release-note worthy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants