Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packages can inherit fields from their root workspace #9684

Closed
wants to merge 14 commits into from

Conversation

Muscraft
Copy link
Member

Tracking issue: #8415
RFC: rust-lang/rfcs#2906

All changes were heavily inspired by #8664
A big thanks to both @yaymukund and @ParkMyCar. I pulled a lot of changes from their branches.

All changes have been made according to the RFC as well as @alexcrichton's comment.

This makes it so that typically a workspace is found before parsing any manifests into packages. Allowing for reduced searching for a root workspace to pull things from. When looking for a path dependency's version (if not specified) there are situations where you have to find the root workspace to find the version but this appears to be very rare.

There are a few problem areas in cargo/src/cargo/util/toml/mod.rs where things get a bit clunky and verbose. I wasn't quite sure how to fix those spots and any help to those areas is appreciated. I also pulled how to join relative paths from this repo. I couldn't find a way within Cargo to join the relative paths correctly so I used that.

Please let me know if this isn't how you expected the changes to be made. I'll be happy to rework anything necessary.

Muscraft and others added 5 commits July 13, 2021 15:23
…has been found and added IntermediateManifest
- Added inheriting dependencies from Workspaces
- Added tests for inheriting TomlProject fields and dependencies
- Fixed tests as needed

Signed-off-by: Scott Schafer <23045215+Muscraft@users.noreply.github.com>
Signed-off-by: Scott Schafer <23045215+Muscraft@users.noreply.github.com>
@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 13, 2021
Muscraft and others added 2 commits July 14, 2021 16:27
Signed-off-by: Scott Schafer <23045215+Muscraft@users.noreply.github.com>
@alexcrichton
Copy link
Member

Thanks so much for helping to push this forward!

I've done a once-over of the PR and here's some thoughts I've got so far:

  • Could you add some tests for success about using dependencies from separate workspaces? It looks like there's a bunch of negative tests (for compile failures) which is great, but I think it'd also be good to test things where a git or a path dependency is used to make sure they load workspace values from the right place. Additionally I think it'd be good to test that there's two workspaces which have two different definitions of a workspace dependency foo, and we should be able to use those two workspaces together (e.g. the second is used via a git or a path dependency).
  • Could you add a test to ensure that the published manifest on crates.io has all of the workspace fields elaborated and filled in? I see there's tests for the JSON metadata but I think we'll probably want one for the manifest as well.
  • I was a bit sad how many places needed InheritableFields to get threaded around. This feature feels like it should ideally be largely isolated to TOML parsing and not have a ton of impact on other portions of the codebase, but threading this field around feels sort of error-prone and difficult to manage. It's also stored in structures like PackageRegistry which feels odd since that in theory doesn't configure TOML parsing much. One possible alternative to this I can think of is to have a cache of "manifest path to workspace fields" located in the Config. That way the toml parsing could consult this cache and lazily populate it as necessary.
  • I haven't read too too closely or confirmed this, but I'd be a little worried that TOML files are parsed multiple times with this implementation (namely the workspace root). Would you be up for adding some instrumentation to confirm that we parse manifests at most once?

I'm not 100% certain if it'll work out to pass this context around in fewer places, but if you have difficult trying it out I can try to dig in more as well.

@Muscraft
Copy link
Member Author

  • I'll try and add more tests to address those things. I'm not quite sure what you mean by having two different workspaces to be pulled from. Do you mean one package has two workspaces?

  • I'll add the test for the published manifest when I do the other tests

  • I agree that InheritableFields gets passed around too much and is in too many places. I couldn't come up with a good way to have it not be passed around as much. I'll look over my notes again to see if I can make having it in Config work or if there is another way I missed it. I think mapping how InheritableFields is being passed as well as why might yield some results into how to fit it in better.

  • You could parse a workspace root multiple times currently, this is mostly because we can't tell if it has been parsed before as well as even if it has you don't have a way to access the fields of it unless they are passed. I know a cache of what has been parsed has been turned down in the past but I think it may be helpful to have one like you described where you have a manifest path and then the fields associated in a map so you could look up if it has been parsed and either parse it or grab the fields. You could extend this to keeping the members of it so that if you encounter a manifest that does not specify a workspace but has a {workspace = true} you could check if it is stated as a member from the previous workspace and act accordingly. If done correctly I think that would allow you to get rid of most of the passing around of things as well as reduce the number of times you parse things overall. This would have a slightly higher memory consumption but it might be a worthwhile trade to not incur the penalty of parsing things multiple times. This feels very similar to @yaymukund's branch, so before I go ahead with this I would like to know if this is what you want or if I should search for another way.

@alexcrichton
Copy link
Member

Oh for two workspaces I mean a crate depending on a crate in a separate workspace. For example you could use a git dependency which is using these workspace directives, so for that dependency to work we'd have to elaborate the manifest of the git dependency. Similarly if you're using a path dependency (or, e.g. a [patch] pointing to a path) which is a member of a separate workspace then you'd also have to do workspace manifest elaboration on that second workspace.

For caches and such, I think that the exact shape of the solution doesn't matter too too much, but I think the main priorities should be:

  • Each TOML file is parsed at most once.
  • Handling of workspace = true should be a relatively low-level operation that doesn't affect the whole codebase.

I suspect that this implementation as-is might parse TOML files multiple times, and while the workspace = true stuff is pretty contained there's more passing around of InheritableFields than would otherwise be ideal. This isn't necessarily critical to fix but the parse-only-once I think is a blocker for merging.

I haven't tried to implement this myself so I don't know exactly what a solution might look like. I naively envisioned in #8664 that a cache wouldn't be necessary since in theory TOML parsing might be done within the context of "here's your workspace fields already for you" or something like that. That may not be true and a cache may be the easiest to implement. I don't want to give the impression that I'm expecting a specific shape of PR, though! If a cache works best I don't think there's any need to avoid it, I mostly just think the above two points are the main things to consider for implementing this.

@Muscraft
Copy link
Member Author

I know you mentioned how in PackageRegistry InheritableFields are there and they really shouldn't be. That is mostly because we aren't using a cache or something to hold things that have been parsed, so you must pass things around. For the most part, you will get InheritableFields before you parse a manifest. Most of the time they are used where there is one workspace and you parse it in the beginning and everything gets passed the correct things during parsing. Where all of the extra passing around comes from is when you need to parse it when you don't have access to the workspace before it. The cache may be able to reduce this some by being put in Config like you said since it is in most places already and for the most part, you have a Config available to look for the things you need. I'll look into this and update what I find early next week. I'll look specifically to make it so each TOML is only parsed one time and try and reduce where InheritableFields are.

@alexcrichton
Copy link
Member

I think a cache will work best here because workspaces can show up in lots of various locations. With path dependencies Cargo could have multiple path dependencies into a different workspace, and we wouldn't want to re-parse the workspace each time we discover that. Similarly if we have multiple dependencies on a git repository we wouldn't want to re-traverse and re-parse the workspace manifest in a git repo each time as well.

To handle these things I think a cache buried in Config will be relatively clean, basically a mapping from absolute path to InheritableFields which is lazily populated as we discover entries that have a specific workspace. This does mean that toml parsing will have to find the workspace root (or someone similar in that area will need to find a workspace root). That's an operation that Cargo doesn't currently do today. For example Cargo doesn't try to find a workspace root in a transitive path dependency or a git repo dependency. To get that working I think that the logic to find the workspace root will need to be tweaked, since the "ceiling" for where the workspace root can be is the root of the git repo for git checkouts, although for path dependencies there should be no ceiling.

I do think that we'll need to be careful to handle workspace root discovery in a somewhat clever fashion as well or otherwise all path and git dependencies will trigger their own search for a workspace root which may be inefficient.

@Muscraft
Copy link
Member Author

Currently, in src/cargo/util/toml/mod.rs, we search for the root workspace if needed, is this what you mean we should be doing?

One solution to searching each time for the root workspace could be to keep a list of a workspace's members so when you go to look for a root manifest you would first check if it's in a map of something identifying the member to the root manifest path. You could use that path to find the InheritableFields from the cache in Config. This could be populated after you parse a workspace root. I'm not sure if this is the most elegant solution but it would allow us to make progress until a better solution can be found.

@alexcrichton
Copy link
Member

Indeed yeah that's one location that I think may be multi-parsing manifests and multi-searching for a workspace root. I agree though that a cache can help the searching problem as well since we can record various paths in the cache as Cargo works its way upward in the filesystem

@bors
Copy link
Contributor

bors commented Jul 21, 2021

☔ The latest upstream changes (presumably #9632) made this pull request unmergeable. Please resolve the merge conflicts.

@joshtriplett joshtriplett added S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 3, 2021
…true

Signed-off-by: Scott Schafer <23045215+Muscraft@users.noreply.github.com>
# Conflicts:
#	src/cargo/core/source/source_id.rs
#	src/cargo/ops/cargo_package.rs
@bors
Copy link
Contributor

bors commented Aug 19, 2021

☔ The latest upstream changes (presumably #9793) made this pull request unmergeable. Please resolve the merge conflicts.

# Conflicts:
#	src/cargo/ops/cargo_install.rs
@ehuss
Copy link
Contributor

ehuss commented Sep 28, 2021

Hey @Muscraft, I just wanted to check in and see where you're at with this PR. I was just curious if you had any questions or major blockers.

@Muscraft
Copy link
Member Author

I've been busy with school and work but those things should resolve this week to allow me time to work on this. I have the initial implementation for the inheritable fields cache done. While doing that it appeared to me that with how things were being done currently there are a few more places that will still cause multiple parses of a toml file. I have found those areas and am hoping to be able to push all of the changes in the next week or so.

- Changed parsing of manifests, so they only get parsed once
- Removed InheritableFields from places it does not belong
- Updated tests

Signed-off-by: Scott Schafer <23045215+Muscraft@users.noreply.github.com>
# Conflicts:
#	tests/testsuite/build_script.rs
Signed-off-by: Scott Schafer <23045215+Muscraft@users.noreply.github.com>
Signed-off-by: Scott Schafer <23045215+Muscraft@users.noreply.github.com>
@Muscraft
Copy link
Member Author

Everything should only be parsed once now. I did not add the cache for workspace member lookup but I can add that if it is needed. There are a few areas that may need a little touching up but I figured it was better to get the core ideas reviewed than touch up what was needed. I purged most places of InheritableFields so it shouldn't be everywhere now.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update here! In the interest of making progress, I'd like to get your thoughts on the possibility of splitting this up. This is a many-months-old PR and is quite difficult to reboot all the context all the time each time I come back to this. Additionally this is a huge change at nearly 3k lines and it's very difficult to productively give feedback on such a large review because I'm sure it can feel overwhelming on your end as well.

My suggestion would be to probably close this PR and start with a fresh one. I think that we should sequence the implementation of this RFC incrementally so it doesn't have to go in all-at-once. This I think will provide a better opportunity to review this in detail and make sure we're building on solid foundations. I in general want to be quite careful with the implementation of this RFC since it touches a lot of very core parts of Cargo and we want to make sure that it's all on solid ground for the future.

Specifically I think this could at least be split up into:

  1. First updates to the TOML parsing code. Somewhere a method would take &InheritableFields (or similar) but that's actually a blank struct. There's some method (I recommended a resolve method here) which would always return None (as if it weren't configured in the workspace) which would always return an error. This means that the change isn't actually test-able yet but it will help get a lot of the structure for TOML parsing in place and how we handle elaboration. Notably this change would not include actually searching for the workspace root or creation of InheritableFields.
  2. Next would be a PR to actually create and pass in an InheritableFields value. This would involve managing all the workspace changes and figuring out where to put all that. The main goal here is that the TOML parsing is all "taken care of" and all we really need to do is figure out how to calculate the parameter to pass into "The Function" which does the parsing.
  3. Finally would be other miscellaneous things like fixing up paths on publish, making sure published things are all in order, etc.

I don't think that this can be split up into a ton of parts, but I do really think it will be beneficial to split this up at least into toml parsing and workspace-field-finding because I find it difficult to keep it all in my head right now and I don't want to overwhelm you with review feedback as well.

@@ -203,7 +215,35 @@ impl<'de, P: Deserialize<'de>> de::Deserialize<'de> for TomlDependency<P> {
V: de::MapAccess<'de>,
{
let mvd = de::value::MapAccessDeserializer::new(map);
DetailedTomlDependency::deserialize(mvd).map(TomlDependency::Detailed)
let details: IntermediateDependency<P> = IntermediateDependency::deserialize(mvd)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the only purpose for this custom Deserialize implementation is to provide a better error string with the expecting method. Having an entirely duplicated IntermediateDependency below I think puts this a bit over the top to the point that we should figure out an alternative method of doing so. Ideally we could go back to #[serde(untagged)] on TomlDependency and just go from there.

Would you be up for seeing if serde has a better way to customize the error string nowadays or whether there's an alternative way to do this without duplicating so much?

}

/// Parses an optional field, defaulting to the workspace's value.
fn ws_default<T, F>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I think would be a good candidate to be a method on MaybeWorkspace perhaps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think that the signature of this function should be:

impl<T> MaybeWorkspace<T> {
    fn resolve(self, label: &str, get_ws: impl FnOnce() -> &T) -> CargoResult<T>;
}

Basically by having this as a method it removes the Option layer, and I don't think that the return value should be MaybeWorkspace but rather always simply T because the purpose of this method is to extract the T from the MaybeWorkspace one way or another.


#[derive(Debug, Deserialize)]
#[serde(untagged)]
enum MaybeWorkspaceBadge {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come badges needed separate handling for this? If it's due to some odd bug somewhere else or something like that I think it'd be fine to not include badges in the first implementation here and to instead have this in a follow-up PR (this one's already quite large)

match (value, workspace) {
(None, _) => Ok(None),
(Some(MaybeWorkspace::Defined(value)), _) => Ok(Some(MaybeWorkspace::Defined(value))),
(Some(MaybeWorkspace::Workspace(TomlWorkspaceField { workspace: true })), ws) => f(ws)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this is roughly the location I would expect some feature-checking to happen along the lines of "you need to enable this nightly feature in your manifest to use this"

@@ -817,14 +894,144 @@ impl<'de> de::Deserialize<'de> for VecStringOrBool {
}
}

fn version_trim_whitespace<'de, D>(deserializer: D) -> Result<semver::Version, D::Error>
pub fn map_deps(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come this moved around?

workspace: true,
}))
} else {
Err(de::Error::custom("workspace cannot be false"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is already checked in other locations, so does it need to be checked here as well?


/// This type is used to deserialize `Cargo.toml` files.
#[derive(Debug, Clone)]
pub struct IntermediateManifest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this structure may want to be defined in the toml parsing area, otherwise it's pretty back-and-forth where it's created here but all the guts of processing happen later with TomlManifest::to_real_manifest.

Also, would it be possible for TomlManifest to itself be an "intermediate" manifest?

let project = me.project.clone().or_else(|| me.package.clone());
let project = &mut project.ok_or_else(|| anyhow!("no `package` section found"))?;

let workspace_config = match (me.workspace.as_ref(), project.workspace.as_ref()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks duplicated with the to_intermediate method?

}
};

TomlManifest::parse_toml_project(project, &inheritable)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the primary locations where I don't think this is quite the right model to implement this. I think it would be best to leave the fields as MaybeWorkspace<T> within the manifest and then whenever they're read in this function they call the resolve function I suggested above instead of altering the structure of the manifest itself and then reading all the Defined states.


TomlManifest::parse_toml_project(project, &inheritable)?;

project.license_file = match (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it can be done I think it would be best to represent this path-handling in the type of the field in the TomlManifest itself. That would remove the duplication here with license/readmes and encode more information in the type structure of the deserialized type.

@Muscraft
Copy link
Member Author

Muscraft commented Oct 14, 2021

I am going to leave the final decision of splitting this up or not to you. That being said I agree that this needs to be done in a careful way as it touches many of the core parts of cargo. That is one of the main problems with this RFC. It takes a system that works in a linear fashion and adds components of nonlinearity. This makes changes to many cores areas and causes a lot of the headache.

I am willing to deal with the changes but I understand that it is a lot to comment on so I will be happy to split this up to make it easier to review. If we were to split this up I would like more of a description of what is to be done in each part. My current understanding is in part 1 we would

  • Introduce InheritableFields as an empty struct
  • Set up the resolving of InheritableFields to always return none
  • Change TomlManifest to be similar to the intermediate version or something similar as you suggested
  • We would leave the fields as MaybeWorkspace<T> within the manifest and then whenever they're read in they call the resolve function as you suggested

I think this might be able to be expanded upon so that

  • InheritableFields would be introduced but be the same as now
  • TomlManifest would be changed into the "intermediate" version or something similar as you suggested
  • We would leave the fields as MaybeWorkspace<T> within the manifest and then whenever they're read they call the resolve function as you suggested
  • The resolve function actually "resolves" the field, since no manifest currently will have a { workspace = true } and if it does it will return an error

I think it makes this whole process a bit more testable as we can test the parsing of InheritableFields for a workspace as well as test for { workspace = true } by looking for an error in the test. You also could make it so that if the package is the workspace root it could have { workspace = true }. This would also allow testing of the resolve behavior to some degree giving more confidence that things are working.

This makes it so all that is needed in part 2 is to set up finding the workspace root, set up the cache of InheritableFields as well as the cache for known members of a workspace to reduce the searching for a workspace root. All of that functionality could be added to resolve since it would be called one at a time and it could search for the workspace root and add it to the cache.

I may have oversimplified things a bit but I am happy to discuss this more either on here, in Zulip, on a call, or another communication method.

@alexcrichton
Copy link
Member

That sounds reasonable to me, yeah. I'm ok with that split! My goal is to basically to trim this down to being more manageable to land in incremental pieces so review/feedback can be more effective.

@bors
Copy link
Contributor

bors commented Nov 14, 2021

☔ The latest upstream changes (presumably #10081) made this pull request unmergeable. Please resolve the merge conflicts.

@blairjam
Copy link

Is this PR still being worked on?

@Muscraft
Copy link
Member Author

Yes, I am hoping after the holiday season I can open the new PR

@alexcrichton
Copy link
Member

Thanks for the PR, but I'm going to be stepping down from the Cargo team so I'm going to un-assign myself from this. The Cargo team will help review this when they get a chance.

@Muscraft
Copy link
Member Author

Closed in favor of This PR

@Muscraft Muscraft closed this Mar 22, 2022
bors added a commit that referenced this pull request Mar 25, 2022
Part 1 of RFC2906 - Packages can inherit fields from their root workspace

Tracking issue: #8415
RFC: rust-lang/rfcs#2906

All changes were heavily inspired by #8664 and #9684

A big thanks to both `@yaymukund` and `@ParkMyCar.` I pulled a lot of inspiration from their branches.

I would also like to thank `@alexcrichton` for all the feedback on the first attempt. It has brought this into a much better state.

All changes have been made according to the RFC as well as `@alexcrichton's` [comment](#8664 (comment)).

This is part 1 of many, as described by [this comment](#9684 (comment)), [this comment](#9684 (review)) and redefined  [by this one](#10497 (comment)).

This PR focuses on inheriting in root package, including:
- Add `MaybeWorkspace<T>` to allow for `{ workspace = true }`
- Add a new variant to `TomlDependency` to allow inheriting dependencies from a workspace
- Add `InheritableFields` so that we have somewhere to get a value from when we `resolve` a `MaybeWorkspace<T>`
  - `license_file` and `readme` are in `InheritableFields` in this part but are not `MaybeWorkspace` for reasons [described here](#10497 (comment))
- Add a method to `resolve` a `MaybeWorkspace<T>` into a `T` that can fail if we have nowhere to pull from or the workspace does not define that field
- Disallow inheriting from a different `Cargo.toml`
  - This means that you can only inherit from inside the same `Cargo.toml`, i.e. it has both a `[workspace]` and a `[package]`
  - Forcing this requirement allows us to test the implementations of `MaybeWorkspace<T>` and the new `TomlDependency` variant without having to add searching for a workspace root and the complexity with it

Remaining implementation work for the RFC
- Support inheriting in any workspace member
- Correctly inherit fields that are relative paths
  - Including adding support for inheriting `license_file`,  `readme`, and path-dependencies
-  Path dependencies infer version directive
- Lock workspace dependencies and warn when unused
- Optimizations, as needed
- Evaluate any new fields for being inheritable (e.g. `rust-version`)
- Evaluate making users of `{ workspace = true }` define the workspace they pull from in `[package]`

Areas of concern:
- `TomlDependency` Deserialization is a mess, and I could not figure out how to do it in a cleaner way without significant headaches. I ended up having to do the same thing as last time [which was an issue](#9684 (comment)).
- Resolving on a `MaybeWorkspace` feels extremely verbose currently:
  ```rust
  project.homepage.clone().map(|mw| mw.resolve(&features, "homepage", || get_ws(inheritable)?.homepage())).transpose()?,
  ```
  This way allows for lazy resolution of inheritable fields and finding a workspace (in part 2) so you do not pay a penalty for not using this feature. The other bit of good news is `&features` will go away as it is just feature checking currently.

- This feature requires some level of duplication of code as well as remaking the original `TomlManifest` which adds extra length to the changes.
- This feature also takes a linear process and makes it potentially non-linear which adds lots of complexity (which will get worse in Part 2)

Please let me know if you have any feedback or suggestions!
@Muscraft Muscraft deleted the dedup_workspace branch May 27, 2022 01:15
@Muscraft Muscraft restored the dedup_workspace branch May 27, 2022 01:15
@Muscraft Muscraft deleted the dedup_workspace branch May 27, 2022 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants