A visitor API for exploring a package's features and their dependencies #281

regexident · 2024-12-04T18:56:18Z

An exploration for how Metadata could provide a flexible API for processing a package's feature dependency graph.

Resolves #279

…pendencies

obi1kenobi

Personally, I'm unfortunately not a fan of this API. I find it very confusingly structured and documented, and it isn't clear to me that the flexibility a visitor offers compared to other approaches is valuable for many use cases.

I can imagine two modes where feature information is useful:

Which things are directly enabled by which features? This is analogous to "what's written in the Cargo.toml next to each feature."
What are all the things enabled by each feature? This is analogous to "if I enable feature X, what do I actually get."

Obviously, a visitor can cover both modes and everything in between. But it isn't clear to me that there is any use case that lies in between. So I can't justify the complexity of either maintaining this API, nor of users learning how to use it.

For my 2 cents, I'd prefer an API that offered functionality for each of the two modes, and was designed to be simple and easy to understand.

obi1kenobi · 2024-12-05T00:45:04Z

src/visit.rs

+    /// Visits a missing dependency.
+    ///
+    /// Return `Ok(())` to continue the walk, or `Err(…)` to abort it.
+    fn visit_missing_dependency(&mut self, dep_name: &str) -> Result<(), Self::Error>;
+
+    /// Visits a missing package.
+    ///
+    /// Return `Ok(())` to continue the walk, or `Err(…)` to abort it.
+    fn visit_missing_package(&mut self, pkg_name: &str) -> Result<(), Self::Error>;


I'm confused about the terminology of missing dependency and missing package. In what sense are they missing, given that we seem to know their names based on the function signatures?

While all the trait items are documented, it would be good to frame their documentation in terms of "what does the user need to understand to choose a good implementation here." Right now the documentation seems to be written from the perspective of "what does the code driving the visitor need to know to drive it properly", which isn't that helpful for end-users.

The documentation was mostly copied from the corresponding APIs on cargo itself, but the names used in the implementation slightly drifted afterwards. I've cleaned things up and also added more information. 👍

The need for visit_missing_dependency(…) and visit_missing_package(…) is twofold:

The Metadata only contains packages for crates that were part of the active crate graph, i.e. crates that either were workspace members, non-optional dependencies or optional dependencies that where enabled via corresponding features. So if you walk the feature graph from a feature that would enable a dependency but that dependency wasn't also activated via MetadataCommand.features(…), then lookup of the corresponding Package on Metadata will fail. But we need the Package to look up its features for the recursion. (I'd expect visit_missing_dependency(…) to be unnecessary for valid Metadata instances and haven't seen it fail myself so far.)

The Metadata might have been deserialized from an unchecked JSON blob with packages/dependencies missing. I'll leave deciding on whether or not we want to gracefully handle such situations to @oli-obk, but wanted to avoid pestering the draft implementation with .unwrap()s.

These failure modes are also partly what made me switch to a visitor API.

I had initially implemented a single method on Metadata but the implementation very quickly became unwieldy and inflexible:

fn transitive_feature_dependencies<'a>(&'a self, package: &'a Package, feature_name: &str) -> Result<BTreeMap<&'a Package, BTreeSet<FeatureValue>>> { // ... }

(When I initially added said method on Metadata itself I stumbled upon the issue of expecting it to return -> Result<…>, which in the crate is an alias for -> Result<…, cargo_metadata::Error>, with the latter being an error type that so far is 100% concerned with I/O-layer errors. Adding such new logic-layer errors to the same error type felt wrong to me for some reason. Mostly because I'd expect 99% of users to only want to read a manifest file. Those would now have to figure out how to properly deal with such additional, yet unreachable error cases. Returning a different error type from these new APIs however would be confusing as well, as Result<…> implies that it covers all errors emitted by the crate. But maybe this just hints at the errors needing some work to be done as well, to make things fit well. But this feels to me like it needs more of a "broader vision" to get right and feels somewhat out of scope for this PR.)

But whether or not this method should fail on missing packages depends on the use-case, I think.
With a visitor this decision is totally up to you, without bloating the crate's API with options.

Similarly one may want to be able to pick between different collection modes: recursive (i.e. transitive) and non-recursive (i.e. immediate) feature dependencies. Again, a visitor API leaves this up to the user, without bloating the crate's API with options.

src/visit.rs

obi1kenobi · 2024-12-05T00:48:37Z

src/visit.rs

+    /// Visits a feature that's enabling a dependency with `dep:dep_name` syntax.
+    ///
+    /// Return `Ok(<walk>)` where `<walk>` indicates whether or not to walk the feature's downstream dependencies,
+    /// or `Err(…)` to abort the walk.
+    fn visit_dep(&mut self, package: &Package, dep_name: &str) -> Result<bool, Self::Error> {
+        let (..) = (package, dep_name);
+        Ok(true)
+    }
+
+    /// Visits a feature that's enabling a feature on a dependency with `crate_name/feat_name` syntax.


Both of these doc comments say "visits a feature ..." but they both seem to take package: &Package and nothing resembling a feature. I'm confused about how this interface works =/

The three visitor methods correspond to the variants of the FeatureValue type (which was copied from cargo itself). Every time the walker passes such a value in the feature graph it calls the corresponding method on the visitor that it carries along.

The variants of said type unfortunately have rather confusing names already: Feature, Dep, DepFeature, especially when considering that FeatureValue is itself the value of a "feature dependency" (i.e. a feature depended on by another feature). Finding proper names for these was … a challenge. One that I decided to mostly gloss over for now. #bike-shedding.

Preferably the FeatureValue enum would have new-type variants, rather than struct variants, so that each variant's type could be passed to the corresponding visitor method and also so that the corresponding type of feature value could be handled as a single value. But! By promoting each variant to individual types each of them would have to have their own Serialize/Deserialize/FromStr implementations. An explosion of complexity and change that I wanted to avoid if possible. That said I do think that the end goal should be a new-type based enum.

src/dependency.rs

regexident · 2024-12-05T16:29:31Z

Personally, I'm unfortunately not a fan of this API. I find it very confusingly structured and documented, and it isn't clear to me that the flexibility a visitor offers compared to other approaches is valuable for many use cases.

I can imagine two modes where feature information is useful:

Which things are directly enabled by which features? This is analogous to "what's written in the Cargo.toml next to each feature."

What are all the things enabled by each feature? This is analogous to "if I enable feature X, what do I actually get."

Obviously, a visitor can cover both modes and everything in between. But it isn't clear to me that there is any use case that lies in between. So I can't justify the complexity of either maintaining this API, nor of users learning how to use it.

For my 2 cents, I'd prefer an API that offered functionality for each of the two modes, and was designed to be simple and easy to understand.

I'd 100% agree if such a Visitor API would be the only API provided.
For common use-cases I would expect the crate to provide batteries-included "collectors", like the TransitiveFeatureCollector that I use in the tests.

Going with a visitor API doesn't have to mean "difficult to use". Instead it allows you to neatly compartmentalize the logic in a "walker" and avoid diluting the API of Metadata with methods and options for all the different usage needs.

Also speaking from an implementation point of view: migrating to a visitor made the implementation of the feature graph traversal logic much simpler and cleaner. And by being driven from outside having a visitor also makes the implementation less likely to accumulate all kinds of intricate control logic (to covered different use cases, etc) in the future.

obi1kenobi · 2024-12-05T16:33:28Z

Pulling your comment out of the thread for broader visibility:

But whether or not this method should fail on missing packages depends on the use-case, I think.
With a visitor this decision is totally up to you, without bloating the crate's API with options.

Similarly one may want to be able to pick between different collection modes: recursive (i.e. transitive) and non-recursive (i.e. immediate) feature dependencies. Again, a visitor API leaves this up to the user, without bloating the crate's API with options.

I don't find this argument convincing, personally. Leaving more things up to the user is not always a good choice.

cargo_metadata already has low-level functionality that supports all the possible special cases and leaves maximum choice up to the users. I don't think it's worth adding a new and very complex layer that also tries to be super generic.

I think what's missing is a simple way to answer the most common questions:

What is directly enabled by this feature?
What are all things transitively enabled by this feature?

Something a user could easily find, quickly understand, and immediately use in their own code. A visitor API in my book accomplishes none of that.

I'm not the cargo_metadata maintainer so the final call isn't mine to make. But as a user of cargo_metadata, I wouldn't switch from what I currently have to a visitor-based API. It would make everything more complex for me to understand and maintain, not less.

regexident · 2024-12-05T16:43:24Z

I'm not the cargo_metadata maintainer so the final call isn't mine to make. But as a user of cargo_metadata, I wouldn't switch from what I currently have to a visitor-based API. It would make everything more complex for me to understand and maintain, not less.

And I wouldn't expect you to. For the 80% I'd expect convenient methods to exist. The for other 20% a visitor API is preferable.

One such use case being mine (the one you reminded me of, when you brought up your use case): I'd need to be able to selectively block certain downstream features from (it and its dependencies) being collected. With a method that only returns the immediate dependencies I'd be out of luck as I would have to write the recursive traversal logic myself, which is arguably more difficult (to do right) than implementing a visitor. And a one-off method for transitive feature dependencies wouldn't be of much use either.

oli-obk · 2024-12-09T14:56:39Z

I also feel like the visitor is overkill, but until we get generators, I can also see how an iterator based version may be annoying to implement.

I'd be fine adding this under an unstable_visitor feature gate, and using it to implement a higher level API (along with giving you the ability to play with it directly), but I'm hesitant to support a visitor API forever.

regexident mentioned this pull request Dec 4, 2024

Provide higher-level API around a Package's features? #279

Open

Implement visitor API for exploring a package's features and their de…

161d86c

…pendencies

regexident force-pushed the package-features-api branch from a2ded76 to 161d86c Compare December 4, 2024 19:21

obi1kenobi reviewed Dec 5, 2024

View reviewed changes

regexident added 2 commits December 5, 2024 16:32

Remove unnecessary renamed() method from Dependency

f8c1038

Improve and clean up documentation of FeatureVisitor trait

13cbc6b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A visitor API for exploring a package's features and their dependencies #281

A visitor API for exploring a package's features and their dependencies #281

regexident commented Dec 4, 2024

obi1kenobi left a comment

obi1kenobi Dec 5, 2024

regexident Dec 5, 2024 •

edited

Loading

obi1kenobi Dec 5, 2024

regexident Dec 5, 2024

regexident commented Dec 5, 2024

obi1kenobi commented Dec 5, 2024

regexident commented Dec 5, 2024 •

edited

Loading

oli-obk commented Dec 9, 2024

A visitor API for exploring a package's features and their dependencies #281

Are you sure you want to change the base?

A visitor API for exploring a package's features and their dependencies #281

Conversation

regexident commented Dec 4, 2024

obi1kenobi left a comment

Choose a reason for hiding this comment

obi1kenobi Dec 5, 2024

Choose a reason for hiding this comment

regexident Dec 5, 2024 • edited Loading

Choose a reason for hiding this comment

obi1kenobi Dec 5, 2024

Choose a reason for hiding this comment

regexident Dec 5, 2024

Choose a reason for hiding this comment

regexident commented Dec 5, 2024

obi1kenobi commented Dec 5, 2024

regexident commented Dec 5, 2024 • edited Loading

oli-obk commented Dec 9, 2024

regexident Dec 5, 2024 •

edited

Loading

regexident commented Dec 5, 2024 •

edited

Loading