Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A visitor API for exploring a package's features and their dependencies #281

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

regexident
Copy link
Contributor

An exploration for how Metadata could provide a flexible API for processing a package's feature dependency graph.

Resolves #279

@regexident regexident force-pushed the package-features-api branch from a2ded76 to 161d86c Compare December 4, 2024 19:21
Copy link
Collaborator

@obi1kenobi obi1kenobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'm unfortunately not a fan of this API. I find it very confusingly structured and documented, and it isn't clear to me that the flexibility a visitor offers compared to other approaches is valuable for many use cases.

I can imagine two modes where feature information is useful:

  • Which things are directly enabled by which features? This is analogous to "what's written in the Cargo.toml next to each feature."
  • What are all the things enabled by each feature? This is analogous to "if I enable feature X, what do I actually get."

Obviously, a visitor can cover both modes and everything in between. But it isn't clear to me that there is any use case that lies in between. So I can't justify the complexity of either maintaining this API, nor of users learning how to use it.

For my 2 cents, I'd prefer an API that offered functionality for each of the two modes, and was designed to be simple and easy to understand.

src/visit.rs Outdated
Comment on lines 10 to 18
/// Visits a missing dependency.
///
/// Return `Ok(())` to continue the walk, or `Err(…)` to abort it.
fn visit_missing_dependency(&mut self, dep_name: &str) -> Result<(), Self::Error>;

/// Visits a missing package.
///
/// Return `Ok(())` to continue the walk, or `Err(…)` to abort it.
fn visit_missing_package(&mut self, pkg_name: &str) -> Result<(), Self::Error>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about the terminology of missing dependency and missing package. In what sense are they missing, given that we seem to know their names based on the function signatures?

While all the trait items are documented, it would be good to frame their documentation in terms of "what does the user need to understand to choose a good implementation here." Right now the documentation seems to be written from the perspective of "what does the code driving the visitor need to know to drive it properly", which isn't that helpful for end-users.

Copy link
Contributor Author

@regexident regexident Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation was mostly copied from the corresponding APIs on cargo itself, but the names used in the implementation slightly drifted afterwards. I've cleaned things up and also added more information. 👍

The need for visit_missing_dependency(…) and visit_missing_package(…) is twofold:

  1. The Metadata only contains packages for crates that were part of the active crate graph, i.e. crates that either were workspace members, non-optional dependencies or optional dependencies that where enabled via corresponding features. So if you walk the feature graph from a feature that would enable a dependency but that dependency wasn't also activated via MetadataCommand.features(…), then lookup of the corresponding Package on Metadata will fail. But we need the Package to look up its features for the recursion. (I'd expect visit_missing_dependency(…) to be unnecessary for valid Metadata instances and haven't seen it fail myself so far.)

  2. The Metadata might have been deserialized from an unchecked JSON blob with packages/dependencies missing. I'll leave deciding on whether or not we want to gracefully handle such situations to @oli-obk, but wanted to avoid pestering the draft implementation with .unwrap()s.

These failure modes are also partly what made me switch to a visitor API.

I had initially implemented a single method on Metadata but the implementation very quickly became unwieldy and inflexible:

fn transitive_feature_dependencies<'a>(&'a self, package: &'a Package, feature_name: &str) -> Result<BTreeMap<&'a Package, BTreeSet<FeatureValue>>> {
    // ...
}

(When I initially added said method on Metadata itself I stumbled upon the issue of expecting it to return -> Result<…>, which in the crate is an alias for -> Result<…, cargo_metadata::Error>, with the latter being an error type that so far is 100% concerned with I/O-layer errors. Adding such new logic-layer errors to the same error type felt wrong to me for some reason. Mostly because I'd expect 99% of users to only want to read a manifest file. Those would now have to figure out how to properly deal with such additional, yet unreachable error cases. Returning a different error type from these new APIs however would be confusing as well, as Result<…> implies that it covers all errors emitted by the crate. But maybe this just hints at the errors needing some work to be done as well, to make things fit well. But this feels to me like it needs more of a "broader vision" to get right and feels somewhat out of scope for this PR.)

But whether or not this method should fail on missing packages depends on the use-case, I think.
With a visitor this decision is totally up to you, without bloating the crate's API with options.

Similarly one may want to be able to pick between different collection modes: recursive (i.e. transitive) and non-recursive (i.e. immediate) feature dependencies. Again, a visitor API leaves this up to the user, without bloating the crate's API with options.

src/visit.rs Outdated Show resolved Hide resolved
src/visit.rs Outdated
Comment on lines 33 to 42
/// Visits a feature that's enabling a dependency with `dep:dep_name` syntax.
///
/// Return `Ok(<walk>)` where `<walk>` indicates whether or not to walk the feature's downstream dependencies,
/// or `Err(…)` to abort the walk.
fn visit_dep(&mut self, package: &Package, dep_name: &str) -> Result<bool, Self::Error> {
let (..) = (package, dep_name);
Ok(true)
}

/// Visits a feature that's enabling a feature on a dependency with `crate_name/feat_name` syntax.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these doc comments say "visits a feature ..." but they both seem to take package: &Package and nothing resembling a feature. I'm confused about how this interface works =/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three visitor methods correspond to the variants of the FeatureValue type (which was copied from cargo itself). Every time the walker passes such a value in the feature graph it calls the corresponding method on the visitor that it carries along.

The variants of said type unfortunately have rather confusing names already: Feature, Dep, DepFeature, especially when considering that FeatureValue is itself the value of a "feature dependency" (i.e. a feature depended on by another feature). Finding proper names for these was … a challenge. One that I decided to mostly gloss over for now. #bike-shedding.

Preferably the FeatureValue enum would have new-type variants, rather than struct variants, so that each variant's type could be passed to the corresponding visitor method and also so that the corresponding type of feature value could be handled as a single value. But! By promoting each variant to individual types each of them would have to have their own Serialize/Deserialize/FromStr implementations. An explosion of complexity and change that I wanted to avoid if possible. That said I do think that the end goal should be a new-type based enum.

src/dependency.rs Outdated Show resolved Hide resolved
@regexident
Copy link
Contributor Author

Personally, I'm unfortunately not a fan of this API. I find it very confusingly structured and documented, and it isn't clear to me that the flexibility a visitor offers compared to other approaches is valuable for many use cases.

I can imagine two modes where feature information is useful:

  • Which things are directly enabled by which features? This is analogous to "what's written in the Cargo.toml next to each feature."
  • What are all the things enabled by each feature? This is analogous to "if I enable feature X, what do I actually get."

Obviously, a visitor can cover both modes and everything in between. But it isn't clear to me that there is any use case that lies in between. So I can't justify the complexity of either maintaining this API, nor of users learning how to use it.

For my 2 cents, I'd prefer an API that offered functionality for each of the two modes, and was designed to be simple and easy to understand.

I'd 100% agree if such a Visitor API would be the only API provided.
For common use-cases I would expect the crate to provide batteries-included "collectors", like the TransitiveFeatureCollector that I use in the tests.

Going with a visitor API doesn't have to mean "difficult to use". Instead it allows you to neatly compartmentalize the logic in a "walker" and avoid diluting the API of Metadata with methods and options for all the different usage needs.

Also speaking from an implementation point of view: migrating to a visitor made the implementation of the feature graph traversal logic much simpler and cleaner. And by being driven from outside having a visitor also makes the implementation less likely to accumulate all kinds of intricate control logic (to covered different use cases, etc) in the future.

@obi1kenobi
Copy link
Collaborator

Pulling your comment out of the thread for broader visibility:

But whether or not this method should fail on missing packages depends on the use-case, I think.
With a visitor this decision is totally up to you, without bloating the crate's API with options.

Similarly one may want to be able to pick between different collection modes: recursive (i.e. transitive) and non-recursive (i.e. immediate) feature dependencies. Again, a visitor API leaves this up to the user, without bloating the crate's API with options.

I don't find this argument convincing, personally. Leaving more things up to the user is not always a good choice.

cargo_metadata already has low-level functionality that supports all the possible special cases and leaves maximum choice up to the users. I don't think it's worth adding a new and very complex layer that also tries to be super generic.

I think what's missing is a simple way to answer the most common questions:

  • What is directly enabled by this feature?
  • What are all things transitively enabled by this feature?

Something a user could easily find, quickly understand, and immediately use in their own code. A visitor API in my book accomplishes none of that.

I'm not the cargo_metadata maintainer so the final call isn't mine to make. But as a user of cargo_metadata, I wouldn't switch from what I currently have to a visitor-based API. It would make everything more complex for me to understand and maintain, not less.

@regexident
Copy link
Contributor Author

regexident commented Dec 5, 2024

I'm not the cargo_metadata maintainer so the final call isn't mine to make. But as a user of cargo_metadata, I wouldn't switch from what I currently have to a visitor-based API. It would make everything more complex for me to understand and maintain, not less.

And I wouldn't expect you to. For the 80% I'd expect convenient methods to exist. The for other 20% a visitor API is preferable.

One such use case being mine (the one you reminded me of, when you brought up your use case): I'd need to be able to selectively block certain downstream features from (it and its dependencies) being collected. With a method that only returns the immediate dependencies I'd be out of luck as I would have to write the recursive traversal logic myself, which is arguably more difficult (to do right) than implementing a visitor. And a one-off method for transitive feature dependencies wouldn't be of much use either.

@oli-obk
Copy link
Owner

oli-obk commented Dec 9, 2024

I also feel like the visitor is overkill, but until we get generators, I can also see how an iterator based version may be annoying to implement.

I'd be fine adding this under an unstable_visitor feature gate, and using it to implement a higher level API (along with giving you the ability to play with it directly), but I'm hesitant to support a visitor API forever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide higher-level API around a Package's features?
3 participants