-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
object_store: builder configuration api #3436
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the direction of this. Can you outline a bit how this is exposed to python?
enum AzureConfigKey { | ||
/// The name of the azure storage account | ||
/// | ||
/// Supported keys: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see now why a second layer is needed: you want to support multiple variants of the environment variables. This is ok with me, but supporting more than 2-3 variants for the same name will be confusing in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually one of my main concern. On one hand there simply is quite a few variants out there people migt be used to, on the other hand, having many options for the same thing is confusing.
Ah I see that you have this separate project here: https://github.com/roeap/object-store-python. I can start to play with this :) I still need the Rust object_store configuration to be inert data that I can pass around. |
object_store/src/azure/mod.rs
Outdated
UseEmulator, | ||
} | ||
|
||
static ALIAS_MAP: Lazy<HashMap<&'static str, AzureConfigKey>> = Lazy::new(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a HashMap noticeably faster than as simple match
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not - main reason for a map is so we can iterate through it when reading from the env.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, we got rid of the maps now :).
@roeap I like where this is going. I am mostly testing with AWS, so for me a good next step is to integrate the AWS side in this pola-rs/polars#5972 I plan to use this PR and your python companion once the work has progressed enough. I assume that once this is stable it will also be suitable for How do you want to collaborate on this?
Let me know. |
@winding-lines, once we know the pattern, most of this is just leg-work, so best to just get it done 😆. Best case we can get this in before #3422. |
object_store/src/aws/mod.rs
Outdated
/// Set an option on the builder via a key - value pair. | ||
pub fn with_option( | ||
mut self, | ||
key: impl Into<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roeap thanks for the super quick turn around time on this one. This is pretty close to what I am thinking, I would like to suggest one more change?
As an user I would like to have my configuration fully checked by the compiler. So I would like to either be able to pass in one of the AmazonS3ConfigKey
s as the key or have a helper function that returns one of the string representations. This way in my code I never have to type a string constant for the key.
Let me know what you think....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ported my PR to the code so far. I had to toggle back and forth between the 2 PRs to make sure I get these constants correctly.
https://github.com/pola-rs/polars/pull/5972/files#diff-28ad3830b06d49511094951431beea1edcf6a567f646250482e9f00f769ace27R16
This is error prone specially since you have a type safe Enum sitting right there...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now have some conversions implemented for the config key enums, so you should be able to pass in the variants directly, or constuct a map with the variants as keys. Is that what you are looking for here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like where this is headed, I left some comments.
The main part I'm not certain about is how to propagate errors, ideally an unrecognised config key, or a config key containing an invalid value would return an error. To avoid breaking backwards compatibility, from_env could just log this error.
let aws_endpoint = "object_store:fake_endpoint".to_string(); | ||
let aws_session_token = "object_store:fake_session_token".to_string(); | ||
let options = HashMap::from([ | ||
(AmazonS3ConfigKey::AccessKeyId, aws_access_key_id.clone()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very ergonomic, many thanks 🎉
object_store/src/aws/mod.rs
Outdated
@@ -379,6 +383,170 @@ pub struct AmazonS3Builder { | |||
client_options: ClientOptions, | |||
} | |||
|
|||
/// Configuration keys for [`AmazonS3Builder`] | |||
#[derive(PartialEq, Eq, Hash, Clone, Debug, Copy)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roeap this is very close, I think the last issue from my side is to derive serde::Serialize
and Deserialize
for the *ConfigKey
enums.
Pretty please 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, serialize away 😆.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really nice, just some minor nits.
@winding-lines are you happy with this?
@@ -673,13 +872,33 @@ fn url_from_env(env_name: &str, default_url: &str) -> Result<Url> { | |||
Ok(url) | |||
} | |||
|
|||
fn split_sas(sas: &str) -> Result<Vec<(String, String)>, Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any tests of this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now we do .. tested the refrerence key before that is was useful - albeit now exired :)
object_store/src/azure/mod.rs
Outdated
|
||
#[test] | ||
fn azure_test_config_from_map() { | ||
let azure_client_id = "object_store:fake_access_key_id".to_string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I think you could drop the to_string() and clone() from this test and it should work
Yes, very happy and grateful to both of you 🙇 |
Benchmark runs are scheduled for baseline = 3213ef5 and contender = 28a04db. 28a04db is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #3419.
Rationale for this change
See discussion in #3429
What changes are included in this PR?
This is just a quick draft based on some previous work in other crates, to get some feedback on the API. As an example I implemented a proposal for the azure builder, since one issue is most prominent with azure. Specifically, for azure there are many different commonly used keys vor configuration options and environment variables. Thus the implementation includes some handling for aliases. While I do see the need for this to be handled, I am unsure if this shuld be the responsibility of this crate.
Any feedback is great appreciated - @tustvold @crepererum @winding-lines
Are there any user-facing changes?
yes, a new way to provide configuration to object sore builders.