Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first draft of vision #6

Merged
merged 3 commits into from
Dec 17, 2022
Merged

Add first draft of vision #6

merged 3 commits into from
Dec 17, 2022

Conversation

matthiasbeyer
Copy link
Owner

This PR adds the vision document for this crate, to document what the idea is and what we aim for.

Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
@matthiasbeyer
Copy link
Owner Author

Several things that have to be discussed:

  • Do we actually want JSON-path style value accessing? I do not see much benefit over deserializing to T, actually. But users might want that.
  • There is also an issue about include directives which I am not sure whether we should or want to support. It is clearly a usecase, but adds an enormous amount of complexity to the loading mechanism infrastructure!
  • I would like to include the self-describing configuration types idea into the vision, but others might consider this "icing on the cake" rather than a core requirement.
  • Should we write down that the crate "MUST NOT crash or lead to panic!()s?"
  • Default values and overrides should definitively be supported, but are not yet in the document. Should we add them? I consider them nothing special, as a default setting is nothing more than the "lowest layer" and an override is nothing more than the "highest layer" in the approach described by now.
  • This comment describes that the builder-pattern approach is key. I would like to keep that pattern. It is not written in the vision yet, should we?

@pksunkara and @Gankra but maybe also @epage and others from the rust-cli WG (feel free to ping them) - feel free to comment and suggest changes! 😉 👍

* The crate MUST allow deserializing the loaded configuration objects to a type
that implements `serde::Deserialize` although layering-information and context
is lost this way
* The developer of an app MUST be able to find where a specific value of
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add

configuration errors must point back to the backend location for the layer that they originated from

@epage
Copy link

epage commented Nov 10, 2022

Unsure how this fits in but some things I've run into with my applications

  • Migrate a field (must look up both fields at each layer before moving onto the next layer)
  • --dump-config flag support (take all layers and show them to the user in the file format used by the app)
  • Report errors for extra fields, report warnings for extra fields

And one out of left field that would probably need to be its own crate:

  • Decentralized schema support
    • .gitconfig has no central schema for the configuration but each command access their own set of fields but I still want to interact with it in a higher-level way

Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
@matthiasbeyer
Copy link
Owner Author

Migrate a field (must look up both fields at each layer before moving onto the next layer)

Can you give a user-story for this? I am not sure I completely understand what you mean here.

--dump-config flag support (take all layers and show them to the user in the file format used by the app)

That should be easily possible. A "give me a full example of a config"-type functionality would require that the complete configuration schema is known. That would mean that the user of the crate can actually write a struct MyConfiguration and make that Deserialize (and Serialize). That would be the full configuration, so dumping that with default values is basically a no-brainer.

Or do you mean something different?

Report errors for extra fields, report warnings for extra fields

I think that should also be possible, I will think about it a bit. I think this also plays into the next point

Decentralized schema support

I totally get your point here. The basic issue here is, that the full configuration schema is not known at compiletime but only at runtime. I think we can and should add some schema-verification functionality. That would also cover the "report errors for extra fields" usecase. My idea here would be to let the user write some form of verification "field abc must be there" and then validate the loaded configuration against that.

The "issue" in this crate is basically that each field in a schema is optional if you only look at a single layer of configuration, even if the field is non-optional in the joined configuration object.
Thus, schema verification becomes more involved, but IMO not impossible.

@epage
Copy link

epage commented Nov 10, 2022

Migrate a field (must look up both fields at each layer before moving onto the next layer)

Can you give a user-story for this? I am not sure I completely understand what you mean here.

The cases I remember

  • A field had a typo or unclear name, so we made a new field. Specifying one correctly at a lower layer shouldn't override the old one at a higher layer.
  • We had inconsistent naming (skip- vs no- vs disable- prefixes) and we wanted to consolidate / make it easier to predict
  • We wanted to drop the negative prefixes

--dump-config flag support (take all layers and show them to the user in the file format used by the app)

That should be easily possible. A "give me a full example of a config"-type functionality would require that the complete configuration schema is known. That would mean that the user of the crate can actually write a struct MyConfiguration and make that Deserialize (and Serialize). That would be the full configuration, so dumping that with default values is basically a no-brainer.

Basically

Decentralized schema support

I totally get your point here. The basic issue here is, that the full configuration schema is not known at compiletime but only at runtime.

That might be one way to put it, I'm unsure. Its more of there is no canonical full schema. The schema is defined on a per-field basis.

I've been playing a little with it in git dive but still not happy with the results.

I wouldn't say this is exclusively a feature related to git either. I also did it for an internal build tool at a company I worked at. It is more related to when multiple plugins all share the same config file. For git and that build tool, I liked to do it on a per-field basis. I haven't fully looked into how cargo handles this but I it has a decentralized schema but do not provide a type-safe API, instead having the the caller have to know which type to do the lookup with..

@matthiasbeyer
Copy link
Owner Author

A field had a typo or unclear name, so we made a new field. Specifying one correctly at a lower layer shouldn't override the old one at a higher layer.

So, just that I understand correctly:

Your config type looked like this: struct Config { field_with_typo: bool }
and you fixed that to struct Config { field_without_typo: bool }.

And you layer something else above that, which does contain the with_typo one. Your expectations would now be that the User can specify field_without_typo and it does override the field_with_typo from the layer above?

Did I get that right?


We had inconsistent naming (skip- vs no- vs disable- prefixes) and we wanted to consolidate / make it easier to predict
We wanted to drop the negative prefixes

What would be your expectations in this case? How can the configuration crate help with consolidating this?
If it would be that the crate helps consolidating these values: Wouldn't be the better experience for the user or actually everyone involved be, that the user adapts their configuration file?

Or am I misunderstanding your point here?


I wouldn't say this is exclusively a feature related to git either. I also did it for an internal build tool at a company I worked at. It is more related to when multiple plugins all share the same config file. For git and that build tool, I liked to do it on a per-field basis. I haven't fully looked into how cargo handles this but I it has a decentralized schema but do not provide a type-safe API, instead having the the caller have to know which type to do the lookup with...

I think I know what you mean.

For the "all fields are known" usecase (which is "deserializing into a known Type"), config-rs would only provide the "layering". If you have only one configuration file and all your configuration options are known at compiletime, I do not see any benefit of config-rs over plain serde_json/toml/serde_yaml/etc.

config-rs (IMHO) should target the usecases where you actually have multiple files and want to merge them. That's where the benefit lies. If you also happen to have a schema that is not fully known at compiletime and you have to access values at runtime because you cannot deserialize the configuration to T, config-rs should shine for you.
I even think that config-rs should maybe have options to do "partial deserialize", if that's something we can support.

So if we take git as an example: Some config members are known (the ones of the git binaries itself). They can be deserialized into an actual type: config.deserialize_partial::<GitCoreConfig>()... but some git-customfun tool can put its configuration in there as well... so that tool should also be able to deserialize their share out of the configuration object: config.deserialize_partial::<GitCustomFooConfig>().

Maybe that's something we can explore as well?

Does the above match your thoughts/expectations?

@epage
Copy link

epage commented Nov 12, 2022

For typos, yes, that is correct. For evolving the fields, that is exactly the same case as typos. In both cases, the goal is to not break compatibility. Ideally, we produce warnings in both cases to help people migrate.

config-rs (IMHO) should target the usecases where you actually have multiple files and want to merge them. That's where the benefit lies. If you also happen to have a schema that is not fully known at compiletime and you have to access values at runtime because you cannot deserialize the configuration to T, config-rs should shine for you.
I even think that config-rs should maybe have options to do "partial deserialize", if that's something we can support.

So if we take git as an example: Some config members are known (the ones of the git binaries itself). They can be deserialized into an actual type: config.deserialize_partial::()... but some git-customfun tool can put its configuration in there as well... so that tool should also be able to deserialize their share out of the configuration object: config.deserialize_partial::().

While you can stretch it to work this way, really you are working on the field level rather than the tables level. There are two levels of tables and both are decentralized, so trying to deserialize into top-level types adds a lot of ceremony to the API. Ideally, it would be more like cargo where you pass a path to the field and specify what type to deserialize that field to.

@epage
Copy link

epage commented Nov 12, 2022

Another use case I thought of is that Vec fields can be layered in different ways. Like with other fields, you want it to overwrite other layers. In some cases, you want it to extend the other layers.

For example, in git stack, we maintain a list of globs (gitignore syntax) for what branches are protected. gitignore has its own layering built-in, so we want to use that, so we extend at the layers rather than overwrite.

@matthiasbeyer
Copy link
Owner Author

Ideally, it would be more like cargo where you pass a path to the field and specify what type to deserialize that field to.

Yes, something like that exists today and I plan to support it as well! 👍

Another use case I thought of is that Vec fields can be layered in different ways. Like with other fields, you want it to overwrite other layers. In some cases, you want it to extend the other layers.

That is indeed something we have to carefully think about and design! And actually something I haven't thought about yet, thanks for bringing it up! 👍 ❤️

@epage
Copy link

epage commented Nov 12, 2022

Yes, something like that exists today and I plan to support it as well! 👍

In the systems I've worked on, outside of cargo, they've gone a step further and defined a type that holds the key and has a generic parameter for the type so you can centralize knowledge of this.

Roughly

const PAGER_FIELD = TypedField::<Pager>::new("pager.path");

config.get(PAGER_FIELD)

@Gankra
Copy link

Gankra commented Nov 13, 2022

This all makes sense, I especially like the stuff about holding onto all the configuration context. In particular it would be great if I could point to the what specified a config value if it later turns out to be problematic.

Say, you said to use-dir: "blah/blah" but when I try to access that dir I find it doesn't exist. It would be nice if I could use miette to say:

x Couldn't open source dir blah/blah

/home/gankra/config.json:12

  use-dir: "blah/blah",
  ~~~~~~~~~~~~~~~~~~~~~
                      - requested here

I've experimented with this a bit in cargo-vet using toml-rs's magic Spanned<T> type.

I also experimented with implementing a parser as a validator on top of kdl-rs' toml-edit-like API in kdl-script. To do this I added a .span() method to tons of types in to kdl-rs.

Unfortunately there isn't really any harmonious portable solution for this (and the API I designed for kdl-rs isn't great). Also really production-grade span stuff would involve interners and compression stuff. I think there's an opportunity here for config-rs to be a bit of a guiding light on this stuff, since it wants to cram everything into a unified framework.

@matthiasbeyer
Copy link
Owner Author

After some brief discussion with @zkat on mastodon (thread): I guess we also want to put down into the vision the fact that the crate should be as format agnostic as possible.

Right now, we're holding JSONish data still:

pub enum ConfigElement {
    Null,
    Bool(bool),
    I8(i8),
    I16(i16),
    /* and some more... */
    List(Vec<ConfigElement>),
    Map(HashMap<String, ConfigElement>),
}

(here).

But for KDL or XML to work, that's not enough. I guess we would need some system where ConfigElement is rather a trait that has methods for accessing inner values (map, list,...).
I'm curious how such a system would look like, but I guess it is not the right place to design this here. We should just define that this should be possible, not how.

That as well as span information (as noted by @Gankra above) should be considered crucial features, I think.

Signed-off-by: Matthias Beyer <mail@beyermatthias.de>
@matthiasbeyer
Copy link
Owner Author

Does 7994b36 fit? Is there something else that should be added?

@matthiasbeyer
Copy link
Owner Author

I've been playing with this a bit now and I noticed that being format independent will make things really hard. Because the layering and accessing of values needs to be able to understand and traverse data. If we abstract access to values via a trait inside config-rs, that might make things really hard to implement.

I do still think that this is a viable goal and should be included, though!

@matthiasbeyer
Copy link
Owner Author

bors merge

@bors
Copy link
Contributor

bors bot commented Dec 17, 2022

Build succeeded:

@bors bors bot merged commit 66aac5b into master Dec 17, 2022
@bors bors bot deleted the vision branch December 17, 2022 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants