-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add custom kvp metadata to chunk dictionary #51
Conversation
Great improvements, and good fix to the info command too! Will go through it in more detail later but here are some initial notes.
Yes, I do think the map of kvp should have deterministic order, preferably ordered as when inserted. And since the protobuf map is really just a repeated field of kvp's the order in the dictionary should be stable so it's just a question of how to map it on the Rust side.
Personally I would find it to be enough with compress I should probably put this in a contribution description somewhere... But if/when you do modifications to a PR please just modify each commit which is affected. Like switching to a BTreeMap would belong in the |
Regarding I tried switching the HashMap to a BTreeMap as suggested: prost_build::Config::new()
.btree_map(&["."])
.compile_protos(&["proto/chunk_dictionary.proto"], &["proto/"])
.unwrap(); It kind of worked, in the sense that all of our code after clap did preserve the order in which we initially constructed the first BTreeMap. However, clap itself seems to have no consistency in the order in which it gives the command line arguments. Additionally, even if clap gave us the arguments in the order in which they were specified (it doesn't) I don't know how we'll preserve the order between the two different arguments, because clap doesn't give us the "position" of the argument. Hope that makes sense? |
I can see that it may provide some value so I'm ok with providing both.
I see. I would've expected clap to give us the values in order of occurrence but maybe it doesn't. I didn't really find anything in the docs mentioning the order of multiple values and haven't tested for myself. About the order between But then, for now at least, I think it's enough if the bitar api for building and reading/iterating the kvps keeps the build time order. E.g one should be able to iterate the kvps in the same order as they were provided when building the archive. Cli argument order could potentially be fixed later. For now, maybe just order the pairs on key name on the cli side? Other than that I think I would prefer a regular Vec (like fn metadata(&self) -> &[(&str, &[u8])] And the simplest way to achieve this is probably to make a repeated field in protobuf. Since a protobuf map is really stored the same way it's just a question of how prost will decode it. message ChunkDictionary {
...
repeated KeyValuePair metadata = 8;
}
message KeyValuePair {
string key = 1;
bytes value = 2;
}
``
Does this make any sense to you? |
I think that sorting the keys alphabetically would be the best way to make sure they are deterministic. I can make that change.
I don't think performance is really a concern here (it's going to be fast enough either way) but switching to a Vec of key value pairs is going to be slower and will make consuming code more complex. We need at least one extra mapping in the library (to map to/from the protobuf types) and all the use cases require a hashmap anyways, so to support the info command we're going to need to collect the Vec into a hashmap anyway. I'll need to do the same in my own usecases: there's no value in having a Vec when you need to retrieve a value by key. |
Fair enough. So would storing in a BTreeMap and have an api something like below be sufficient you think? fn metadata_iter(&self) -> impl Iterator<Item=(&str, &[u8])>
fn metadata_value(&self, key: &str) -> Option<&[u8]> |
I have made the discussed changes, please let me know if there's anything further! |
49dd712
to
c0b3279
Compare
Think it looks good! Was thinking that the metadata length value in the info command should use a unit (eg use human_size!) but realized it looked a bit busy when unit it's repeated for every kvp when we have multiple kvps, so I think this is fine. Thank you, will merge! |
c0b3279
to
5c216c0
Compare
This PR contains the following changes:
metadata
field added to ChunkDictionary, and surfaced throughArchive<R>.metadata()
compress
command gets new arguments:--metadata-file <key> <path>
and--metadata-value <key> <value>
info
command (no arguments) displays metadata keys and content lengthinfo
command gets new--metadata-key <key>
argument which writes the kvp content to stdout (and prints nothing else)info
command to use sameInputArchive
logic asclone
command (rationale: info was incorrectly parsing a local Windows file path as a URL and I noticed this was solved already for clone. additionally, if the http server requires an authorization header it's not possible to use without the--http-header
argument).Possible improvements?
--metadata-file <key> <path>
and--metadata-value <key> <value>
arguments.