Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor config #479

Open
avantgardnerio opened this issue Oct 29, 2022 · 11 comments
Open

Refactor config #479

avantgardnerio opened this issue Oct 29, 2022 · 11 comments
Labels
enhancement New feature or request

Comments

@avantgardnerio
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

In #472 it became clear there is a lot of boiler plate, potential runtime issues that could be prevented statically, and generally un-idiomatic Rust issues with our current config solution.

Describe the solution you'd like

It would be good if wee could buy or build a library like envy to serde env to statically typed structs like we do with clap at present.

Describe alternatives you've considered

  1. Keep as HashMaps

Additional context

This issue is opinionated, and serves merely as a place for discussion. If there are other potential resolutions (or no resolution), let's discuss that here.

@avantgardnerio avantgardnerio added the enhancement New feature or request label Oct 29, 2022
@andygrove
Copy link
Member

Whatever we do here, it would be good to consider the design in DataFusion as well since we need to make the two work together.

@mingmwang
Copy link
Contributor

@yahoNanJing

@tfeda
Copy link
Contributor

tfeda commented Nov 23, 2022

I'm interested in picking this up, mostly because there are some other configurations I'm interested in adding that would just add to the work needed here. I'm not partial to keeping things in a HashMap vs a struct, but I think that it should be consistent across the components (the client uses the former, while the scheduler uses the latter).

configure_me (current library) has support for env variables that can be added in. config-rs is another popular library that supports both env variables and configuration files. A downside to switching to envy or config-rs is the loss of man page generation and argument parsing that configure_me provides to the scheduler/executor binaries, but maybe that's not needed.

Considering DataFusion, right now we just use the default configurations with some custom inputs.

pub fn create_df_ctx_with_ballista_query_planner() {
   ...
   let session_config = SessionConfig::new()
        .with_target_partitions(config.default_shuffle_partitions())
        .with_information_schema(true);
   ...
}

I think it would be good to allow users to input their own SessionConfig, or to use its from_env() function in addition to the BallistaConfig when working with environment variables on the client side.

@avantgardnerio
Copy link
Contributor Author

FWIW, I'd love to switch to a more Default & struct update syntax format for configs. Here's an example in practice of building very complex structs for the SQL parser, but only overriding certain fields that the user cares about:

                    let query = ast::Query {
                        body: Box::new(SetExpr::Select(Box::new(ast::Select {
                            projection,
                            from: vec![table],
                            selection,
                            ..default_select()
                        }))),
                        ..default_query()
                    };

IMO this more concisely solves the problem without the multiple state mutations of the builder pattern, which seems closer to idiomatic Rust.

CC @alamb because he's talked about this in datafusion.

@alamb
Copy link
Contributor

alamb commented Nov 23, 2022

Whatever we do here, it would be good to consider the design in DataFusion as well since we need to make the two work together.

I have some thoughts about this in DataFusion -- I will try and write them up coherently later today

@alamb
Copy link
Contributor

alamb commented Nov 23, 2022

I wrote up some thoughts in apache/datafusion#4349

I think DataFusion has both 🤦 key=value type config as well as a rust struct style config. I think we should unify on one or the other of the approaches

@tfeda
Copy link
Contributor

tfeda commented Nov 29, 2022

On a similar note, TiKV exclusively uses cli arguments in their server binary for the "important" configuration items in addition to config files and environment variables for "lesser" configurations.

For example, gRPC configurations would be included as cli flags with the current configuration system. I'm proposing they'd exclusively be tweaked with the environment or a properties file since they're not generally changed often when starting up the servers.

I think that would keep executables' man pages clean as more configurations are added.

@avantgardnerio
Copy link
Contributor Author

I'd strongly recommend removing the hashmap (if we can, do we need to support configuration for downstream projects?), and keeping everything compile time type checked.

I know we don't like bringing in dependencies, but something like this would keep everything checked / generated / etc: https://docs.rs/envconfig/latest/envconfig/

@alamb
Copy link
Contributor

alamb commented Nov 29, 2022

I'd strongly recommend removing the hashmap (if we can, do we need to support configuration for downstream projects?), and keeping everything compile time type checked.

For what it is worth I think think we can still have compile time type checks with a HashMap implementation using a builder style API on SessionConfig:

https://github.com/apache/arrow-datafusion/blob/dd3f72ad13df3e3ab2efde73eba546012eaf10fd/datafusion/core/src/execution/context.rs#L1240-L1245

What do you think about a style like this @avantgardnerio ?

let config = SessionConfig::new()
  .with_target_partitions(18)

@avantgardnerio
Copy link
Contributor Author

I don't love builders for aforementioned reasons I won't belabor, but I think the key here is getting away from what we have now: a combinatorial explosion of initialization methods for each different scenario - that's what's been causing me rebase hell and preventing me from merging PRs. I think builders solve that: two outstanding PRs could both add things to the builder without conflicting with each other, so mission accomplished.

TLDR: LGTM.

@alamb
Copy link
Contributor

alamb commented Nov 30, 2022

two outstanding PRs could both add things to the builder without conflicting with each other, so mission accomplished.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants