Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ConfigOption names into an Enum #4517

Closed
Tracked by #4349
alamb opened this issue Dec 5, 2022 · 9 comments · Fixed by #4771
Closed
Tracked by #4349

Make ConfigOption names into an Enum #4517

alamb opened this issue Dec 5, 2022 · 9 comments · Fixed by #4771

Comments

@alamb
Copy link
Contributor

alamb commented Dec 5, 2022

    Is there any reason we couldn't just make these an enum? Then we could make the `set_x` and `get_x` methods type safe.

The rationale here is to make the configuration code more "rust like" and follow the Rust way of strongly typed when possible

Originally posted by @thinkharderdev in #4492 (comment)

@alamb
Copy link
Contributor Author

alamb commented Dec 5, 2022

So I started bashing this out, with the incremental way like:

/// DataFusion specific configuration names.
pub enum ConfigName
{
    /// Configuration option "datafusion.execution.target_partitions"
    TargetPartitions,
    /// Configuration option "datafusion.catalog.create_default_catalog_and_schema"
    CreateDefaultCatalogAndSchema,
    /// Configuration option "datafusion.catalog.information_schema"
    InformationSchema,
    /// Configuration option "datafusion.optimizer.repartition_joins"
    RepartitionJoins,
    /// Configuration option "datafusion.optimizer.repartition_aggregations"
    RepartitionAggregates,

    /// Configuration option "datafusion.optimizer.repartition_windows"
    RepartitionWindows,
...}

But then I was thinking, if we are going to change the API anyways, why not go all the way with something fully statically typed and removing ConfigDefinition entirely:

/// DataFusion specific configuration names.
pub enum ConfigValue
{
    /// Configuration option "datafusion.execution.target_partitions"
    TargetPartitions(usize),

    /// Configuration option for arbitrary user defined data
    UserDefined {
      name: String,
      value: Option<ScalarValue>
    },
...}

impl ConfigValue {
  /// Return the name of this configuration value
  fn name(&self) -> &str {
    match self {
      Self::TargetPartitions(_) => "datafusion.execution.target_partitions",
      ...
    }

  /// Return the human readable description for this configuration value
  fn description(&self) -> &str {
    match self {
      Self::TargetPartitions(_) =>
                    "Number of partitions for query execution. Increasing partitions can increase \
                 concurrency. Defaults to the number of cpu cores on the system.",
...
  }

  /// set the value of the configuration value
 fn set_value(&mut self, new_value: ScalarValue) ->Result<()>{
     match (self, value) {
      (Self::TargetPartitions(v), ScalarValue::UInt64(Some(new_value))) => *v = new_value,
      (Self::TargetPartitions(v), _)  => return Err("Expected uint64 for {} but got {:?}", self.name(), new_value)).
...
}

But before I go crank that through the process I wanted to get some feedback if that was a desirable way to go. I would retain the ability to store arbitrary name/value pairs in the metadata.

What do you think @thinkharderdev @yahoNanJing @andygrove @avantgardnerio ?

@tustvold
Copy link
Contributor

why not go all the way with something fully statically typed

If you're going to go for statically typed why not just go with just a struct? We could always add a HashMap for custom extensions as the final field.

@avantgardnerio
Copy link
Contributor

avantgardnerio commented Dec 14, 2022

☝️ what he said :)

I'd like to be able to use Default::default() and struct update syntax if possible.

BTW, thank you @alamb for taking on this thankless task.

Recurring conflicts regarding multiple PRs each changing config in their own way is why I still have two of my own sitting in limbo. It will be nice to have this sorted - it should decrease the friction for everyone and PRs should flow faster.

@tustvold
Copy link
Contributor

tustvold commented Dec 14, 2022

struct update syntax

One potential issue with struct update syntax, is any additional config is technically a breaking change. Unfortunately ..Default::default() doesn't work with #[non_exhaustive] as you might add a private field. See here. That being said the following is hardly the end of the world

let mut config = Config::default();
config.property = ...;
config

@thinkharderdev
Copy link
Contributor

I don't have a strong opinion either way. The one thing I do like about @alamb original enum proposal is that is that we get a nice way to associate each config with a config key and description that would maybe a allow a more generic way of handling the decoding from CLI params and spitting out documentation/help text. Although you could probably do the same with a macro in the struct case.

@andygrove
Copy link
Member

I don't think I have a strong opinion either as long as we retain the current capabilities, specifically:

  • Docs get generated from the config definitions
  • Custom key/value pairs can be used so that platforms building on DataFusion can leverage the same mechanism
  • Configs can be serialized easily

@tustvold
Copy link
Contributor

tustvold commented Dec 28, 2022

I had a play around and came up with this - https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=8c1ababec354db7118d3ff5ffd878f1e

I think it meets all of the above requirements, whilst being a "simple" structured type, and not requiring a proc-macro.

Let me know what you think, if people are happy with it I'd be happy to get something like it integrated with DataFusion.

@alamb
Copy link
Contributor Author

alamb commented Dec 28, 2022

Let me know what you think, if people are happy with it I'd be happy to get something like it integrated with DataFusion.

I think it looks good to me 👍

@thinkharderdev
Copy link
Contributor

I had a play around and came up with this - https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=8c1ababec354db7118d3ff5ffd878f1e

I think it meets all of the above requirements, whilst being a "simple" structured type, and not requiring a proc-macro.

Let me know what you think, if people are happy with it I'd be happy to get something like it integrated with DataFusion.

Nice, seems like a good approach

tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Dec 29, 2022
tustvold added a commit that referenced this issue Jan 3, 2023
* Structify ConfigOptions (#4517)

* Simplify

* Fix environment variables

* Tweaks

* Update datafusion-cli

* Format

* Misc cleanup

* Further fixes

* Format

* Update config.md

* Flatten configuration

* Document macro
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants