Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to Cargo for alternative registries #2141

Merged
merged 25 commits into from
Sep 29, 2017

Conversation

carols10cents
Copy link
Member

@carols10cents carols10cents commented Sep 6, 2017

Rendered

Tracking issue

This RFC built on previous work done in RFC 2006. The biggest difference is that this RFC includes a specification for the index format that any registry will need to conform to. Another difference is that this RFC proposes configuring registry locations once, in a .cargo/config, rather than multiple times in each project, both to avoid duplication and to discourage including credentials in each project.

@natboehm and @shepmaster also worked on this RFC :)

@carols10cents carols10cents added the T-cargo Relevant to the Cargo team, which will review and decide on the RFC. label Sep 6, 2017
# Rationale and Alternatives
[alternatives]: #alternatives

A [previous RFC](https://github.com/rust-lang/rfcs/pull/2006) proposed having the registry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Java ecosystem has gone the other direction. Gradle requires that you specify all of your upstream repositories in your build.gradle, and Maven supports both configuration in the project itself and at the user level.

It seems kind of messy for the dev setup instructions to go from "clone the repo" to "clone the repo, add these registries to your ~/.cargo/config, and make sure the names agree across all of the projects you're working on".

When Cargo searches for a .cargo/config, does it stop at the first one it finds or continue looking and union all of them? One nice option could be to go the union route so you could check a .cargo/config into the repo with the right registry configurations.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the points for having the .cargo/config outside of the repository is to avoid checking authentication information into the code-base. From my view this would be a way to support private registries for closed source projects and the common use case is most likely that you will have one internal registry and use crates.io for all publicly available code.

Maybe there could be a cargo add-registry command for the future that can be used to setup any third party registry that is to be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registry authentication information is already stored in a separate file than Cargo.toml and .cargo/config - I don't know why anything would be different here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfackler:

When Cargo searches for a .cargo/config, does it stop at the first one it finds or continue looking and union all of them? One nice option could be to go the union route so you could check a .cargo/config into the repo with the right registry configurations.

It continues looking and unifies all of them. I just made a PR to cargo's docs to make this more readily apparent.

@sedrik:

Maybe there could be a cargo add-registry command for the future that can be used to setup any third party registry that is to be used.

That sounds like a great idea! I'll add a note about that :)

@sfackler

Registry authentication information is already stored in a separate file than Cargo.toml and .cargo/config - I don't know why anything would be different here.

You're right that usernames and passwords should probably go in .cargo/credentials instead of .cargo/config, I'll make that change. Right now, only the token to authenticate to a registry's API is stored in .cargo/credentials, so this RFC will be adding the ability to specify a username and password to enable access to either a registry index or an API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, as long as it's something that can be checked into the repo and doesn't totally suppress user-level configuration I'm on board.

@sedrik
Copy link

sedrik commented Sep 6, 2017

Thanks for proposing this support. This is a blocker for any kind of Rust adoption at my employer (sadly it does not guarantee that we will adopt rust).

Has there been a discussion about supporting organizations and private repositories in crates.io similar to how npmjs does it?


```toml
[dependencies]
secret-crate = { version = "1.0", registry = "my-registry" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to support a short form of this for convenience:

"my-registry/secret-crate" = "1.0"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfackler could we have that syntax reserved for crate namespacing?

Instead I'd propose: [dependencies.my-registry] secret-crate = "1.0".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would this mean in that setup?

[dependencies.foobar]
version = "1.0"

Is it a crate called "version" at 1.0 in the "foobar" registry or a crate called "foobar" in the default registry at version 1.0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh dumb me, that syntax already has a meaning... What about this then:

[registry.my-registry.depdendencies]
secret-crate="1.0"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative (loosely based on how URLs work):

"//my-registry/secret-crate" = "1.0"


```toml
[registry.$choose-a-name]
index = "https://username:password@my-intranet:8080/index"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like credentials should live separately. We've recently moved the crates.io publish token out of .cargo/config.

@withoutboats
Copy link
Contributor

withoutboats commented Sep 6, 2017

Awesome RFC @carols10cents (et al!).

I have a branch of cargo which I believe implements this, though I haven't tested it thoroughly. The only pertinent difference I'm aware of is in the format for declaring a new registry. What I went with was:

  • I called the table registries instead of registry; I believe the registry name is already used in .cargo/config for something else (possibly deprecated? I don't recall at the moment).
  • I supported both just making the registry a key to a URL and having an object with an index member like you propose here.

e.g:

[registries]
foobar = "https://github.com/foobar-co/foobar-index"

[registries.bazquux]
index = "https://github.com/bazquux-org/bazquux-index"

Another possible format choice would be to instead support a syntax like this in the toml, instead of having a registry key in the dependency object itself:

[registry.foobar-co.dependencies]
# all the dependencies in this table come from foobar-co

My branch doesn't implement that, but its worth considering, since it makes it easier to add more dependencies from that alternate registry.

it is possible to have a local crates.io server which crates can be pushed to, while still making
use of the public crates.io server.

We would also like to support the use of crates.io mirrors. These differ from alternative
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would mirrors work in this setup? We'd need some way to say that a registry "acts as" https://github.com/rust-lang/crates.io-index, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I forgot to put in details about mirrors. I'm starting to think that could be separate from this RFC-- we already support source replacement but I want to extend it so you can list multiple mirrors and cargo will automatically fall back if one is inaccessible. That's starting to feel separate, so I'm going to take this paragraph about mirrors out.

@Erik-S
Copy link

Erik-S commented Sep 7, 2017

This is a feature that's important for corporate use, so I'll chime in with my experience.

Storing the passwords in the working directory is generally a bad idea, because in corporate environments the working directory is often on a share drive with fairly open permissions. (Even allowing them to be stored there isn't ideal, because someone will make a mistake.)

The best solution is to put username and password into the OS-specific keystore (GNOME Keyring / Windows Credentials Management / Apple Keychain).

If I read #3978 correctly, Cargo access tokens are already stored in ~/.cargo/credentials. Putting passwords there wouldn't be ideal, but would be much better than the working directory or main Cargo config file.

Storing the username and password as part of the URL is very inflexible. In the future, we may want to support Kerberos/SAML/LDAP/etc logon, so storing the USER/PASSWORD/AUTH_TYPE as separate fields is a good idea.

Ideally the user name would not be in the same file as the registry-name to URL mapping, so the mapping file can be checked in and it will "just work" within a company LAN.

@bbatha
Copy link

bbatha commented Sep 11, 2017

I mentioned this on #2006 and rust-lang/cargo#4208 but I want to make sure that it doesn't get lost in the shuffle. I want to make sure that when new registries are specified that it is possible to specify the full hostname and root path for the registry and not just the host name. For multiple repository hosting solutions like nexus and artifactory it needs to be possible to specify a path as well. For instance, artifactory hosts npm repos at https://host.company.com/api/npm/private-repo so you can host multiple repo types and multiple repos for the same language. Specifying just the host should have a good default but it should be overridable.

@withoutboats
Copy link
Contributor

@bbatha You specify the url of the registry index, which is required to contain the url of the backing store. It is not possible to specify just the hostname.

For example, crates.io would be declared:

[registry.crates-io]
index = "https://github.com/rust-lang/crates.io-index"

- `name`: the name of the crate
- `vers`: the version of the crate this row is describing
- `deps`: a list of all dependencies of this crate
- `cksum`: a checksum of this version's files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/this version's files/the tarball downloaded/

{
"name": "serde",
"req": "^1.0",
"registry": "https://crates.io",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this, like allowed-registries above, specify the index rather than this URL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably.

specifying the list of registries that are allowed with `cargo publish`.

```
publish-registries = ["my-registry"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cargo currently has a publish = false key for totally disallowing publishing, I wonder if we could perhaps overload it?

publish = true # default, publish to crates.io
publish = false # don't publish this anywhere
publish = [] # don't publish this anywhere
publish = ["https://some-other-registry.com"] # publish somewhere other than crates.io

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, do TOML/serde support different types like that???

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - being able to do that kind of thing was one of the main advantages of serde over rustc-serialize. A simple way of doing it is via the "untagged" enum representation: https://serde.rs/enum-representations.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can even find this in Cargo today!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL!

@alexcrichton
Copy link
Member

In the detailed design section there's a note of related issues:

In order to make working with multiple registries more convenient, we would also like to support

Just to be clear, though, this RFC isn't specifically proposing solutions to these? Are they possible future extensions?

(I'd be fine adding solutions for them to this RFC, I think they may be all relatively trivially fixable)

@carols10cents
Copy link
Member Author

@sedrik

Has there been a discussion about supporting organizations and private repositories in crates.io similar to how npmjs does it?

crates.io is likely to remain open source only, but stay tuned :)

@carols10cents
Copy link
Member Author

@withoutboats

I have a branch of cargo which I believe implements this, though I haven't tested it thoroughly. The only pertinent difference I'm aware of is in the format for declaring a new registry. What I went with was:

Awww I was close!!! I like what you've implemented though, I'm going to update this to go with yours :)

Currently, the knowledge of how to create a file in the registry index format is spread between
Cargo and crates.io. This RFC proposes the addition of a Cargo command that would generate this
file locally for the current crate so that it can be added to the git repository using a mechanism
other than a server running crates.io's codebase.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, for this use-case, we'll also need a way to make a .crate file manually. This is already handled by cargo package. Then perhaps cargo package could create both a .crate tarbol, and a .json index metadata?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see us rolling the metadata into package eventually, yeah. I think we should try having them separate at first, cargo already has enough things tangled up with each other that could be independent ;)

@matklad
Copy link
Member

matklad commented Sep 16, 2017

@rfcbot reviewed

@rfcbot
Copy link
Collaborator

rfcbot commented Sep 18, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Sep 18, 2017
@tomwhoiscontrary
Copy link

tomwhoiscontrary commented Sep 20, 2017

This seems pretty cool. I really like the idea of putting the registry names in the checked-in project, and the name-to-address mapping in the environment!

However, if i work for a paranoid company that wants all crate downloads to come from an internal registry, and i want to build some random project i've cloned off Github, can i do that?

That is, if i have a project whose Cargo.toml contains this:

[dependencies]
byteorder = "1.0.0"

Can i force Cargo to go to repo.initech.com rather than crates.io to get it?

I got the impression on reading the RFC that i wouldn't be able to do that. AIUI, the only way to get a crate to come from a specific registry is to say so in the dependency declaration. I would have to say:

[dependencies]
byteorder = { version = "1.0.0", registry = "initech-internal" }

Happily, i don't work for such a paranoid company, so i can get public crates from crates.io and internal crates from some internal registry. But in the past, i have worked for companies where this would not have flown. So, if this isn't currently possible, could we have it? Perhaps we could define a name for crates.io ("default", "crates-io", "pub", whatever), and say that will be used by default. Then i could get those crates from my internal registry by redefining the address that name maps to.


A valid registry index meets the following criteria:

- The registry index is stored in a git repository so that Cargo can efficiently fetch incremental

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) Can the address of the index be a file: URL, or a plain file path? That could be really useful for setting up a local repository. I've done this a few times in the Java world. It could also be useful in reptilian corporate environments where it's easy to put something on a shared drive, but much harder to stand up a server.

(2) Could we allow plain HTTP as well as Git? I could imagine writing a little registry server (20-30 lines of Java!) to serve up my team's internal crates. We only have a few, and don't update them often, so downloading the whole index wouldn't take long. Whereas writing or setting up a Git server would be quite a headache.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) Can the address of the index be a file: URL, or a plain file path?

Yes indeed, this is how I publish to a local instance of crates.io when developing, actually.

(2) Could we allow plain HTTP as well as Git? I could imagine writing a little registry server (20-30 lines of Java!) to serve up my team's internal crates. We only have a few, and don't update them often, so downloading the whole index wouldn't take long. Whereas writing or setting up a Git server would be quite a headache.

For now, we're going to stay with git; being able to send only the delta of changes rather than the whole change is a huge win. While you might only have a few crates to start with, you might have more later, or just more versions of those few crates.

Git includes straightforward ways to run a server, if it's within your firewall and unauthenticated, it's not bad at all.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please accept my belated thanks for your reponse! The point of writing a server would be to proxy to our existing infrastructure, so being able to run a Git server doesn't really help. But being able to use local indices addresses most of of the internal use cases i can imagine, so that shouldn't matter.

@sfackler
Copy link
Member

@tomwhoiscontrary repository mirrors were originally discussed a bit in this RFC but have since been pulled out. There'll presumably be a follow-up RFC to deal with that use case.

@carols10cents
Copy link
Member Author

carols10cents commented Sep 20, 2017

@tomwhoiscontrary

Can i force Cargo to go to repo.initech.com rather than crates.io to get it?

Cargo already supports source replacement, so you are able to do this today! 🎉

What isn't supported yet is being able to list multiple mirrors and automatically falling back to whichever is available. Running a mirror, whether pre-emptively caching everything on crates.io or only caching what's requetsed, is also not simple right now. As @sfackler noted, neither of these concerns are especially related to the changes in this RFC.

@rfcbot
Copy link
Collaborator

rfcbot commented Sep 28, 2017

The final comment period is now complete.

@aturon aturon merged commit 206318a into rust-lang:master Sep 29, 2017
@aturon
Copy link
Member

aturon commented Sep 29, 2017

This RFC has been merged! Tracking issue.

Thanks @carols10cents, @natboehm and @shepmaster!

@carols10cents carols10cents deleted the alternative-registries branch October 2, 2017 23:09
@Centril Centril added the A-registry Proposals relating to the registries of cargo. label Nov 23, 2018
@przygienda
Copy link

Doing some work my observations:

  • "empty string" means now historically = crates.io as registry. this is risky since it will prevent third-party registries to defined in "allowed-registries" whether crates is allowed or not unless they add an empty string to it which looks strange ...
  • are crates ID unique across registries? I don't think that's possible. with that "copying" across regitries will become untracktable because names can clash

@przygienda
Copy link

Writing code & struggling to see what the schema would look like if multiple registries are involved, most importantly:

  • what is the primary key of a registry, it can't be the GIT URL since that moves, neither SHA of first commt, that's too risky

Suggestion: add to config.json a registry-id which is something like an UUID where mirrors of same registry and moving same registry can be recognized ...

@WiSaGaN
Copy link
Contributor

WiSaGaN commented Jul 20, 2019

I searched this thread, but could not find relevant information about how the crates that are published to alternative registry should depend on crates that are on crates.io registry. It seems that leaving it empty means that dependency is in the same registry thus not crates.io. Should all those dependencies use a proxy-name such as "cratesio", and then everyone define locally the index, or is there another way?

@ehuss
Copy link
Contributor

ehuss commented Jul 20, 2019

@WiSaGaN I'm not sure which part you mean by "leaving it empty". In Cargo.toml, the default is crates.io. In the index, if it is null, then the dependency points to the same index. Cargo will automatically handle the translation when publishing. Thus a crates.io dependency will be stored in the index as "https://github.com/rust-lang/crates.io-index".

For dependencies cargo downloads, the crate file tracks the URL (not the registry name), so there's no need to have local definitions.

Documentation can be found at https://doc.rust-lang.org/cargo/reference/registries.html.

@WiSaGaN
Copy link
Contributor

WiSaGaN commented Jul 20, 2019

@ehuss , say I want to publish a crate called bar in my alternative registry foo. bar depends on crates.io serde package, and also depends on another crate bar-dep in the same alternative registry foo. How should I write my bar Cargo.toml dependency section? (bar is going to be published to alternative registry foo)
Should it be

[dependencies]
serde = "1.0"
bar-dep = { version = "1.0", registry = "foo" }

Or

[dependencies]
serde = { version = "1.0", registry = "cratesio" }
bar-dep = "1.0"

And define registry entries in .cargo/config repectively?

@ehuss
Copy link
Contributor

ehuss commented Jul 20, 2019

The first one.

@paddycarey
Copy link

paddycarey commented Jul 20, 2019

Hi @ehuss, I'd just like to clarify the answer you've given here. My current understanding is that library authors should use the second of @WiSaGaN 's examples when publishing to an alternate registry.

I work for @cloudsmith-io and we provide hosted Cargo registries for our users so I'm trying to make sure we've implemented this behaviour correctly.

When a user is creating a library with the intent to publish it to an alternate registry, the assumption is (according to my reading of the docs) that any dependency specified without a registry should be assumed to be in the same registry to which the library is being published. If a user wishes their new library to depend on something from another registry (whether crates.io or some other alternate registry) then they must explicitly define this as per @WiSaGaN 's second example.

It is my understanding that crates.io is not privileged in this way and is treated like any other registry in this case (this is different when dealing with the Cargo.toml belonging to a package being built, but i'm talking about the case of a library here).

A lot of our users have had issues with the behaviour described here, and it'd be awesome if we were able to clarify once and for all how this should work.

EDIT: I've realised we're talking about different things here, I'm thinking of the registry's representation in the index, while @WiSaGaN is asking about Cargo.toml, so my request for clarification doesn't really make sense. Will open a seperate issue elsewhere, sorry for the confusion.

@przygienda
Copy link

so right now interesting problem I see, when I declare in private

[package]
name = "dummy-1"
version = "0.2.0"
edition = "2018"
publish = ["private1"]

[dependencies]
itertools= { version = "0.8", registry = "private2" }

and then make another package "dummy-2"

[package]
name = "dummy-3"
version = "0.1.0"
edition = "2018"
publish = ["private3"]

[dependencies]
dummy-1={ version = "0.2",  registry = "private1" }

when resolving dummy-2 with update cargo tries to find itertools in private1 instead of private2 ! Looking @ the depndency index the registry is not stored so no wonder.

Or do I miss something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-registry Proposals relating to the registries of cargo. final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-cargo Relevant to the Cargo team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.