Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inline_threshold not encoding time value? #468

Open
rsignell opened this issue Jun 23, 2024 · 9 comments
Open

inline_threshold not encoding time value? #468

rsignell opened this issue Jun 23, 2024 · 9 comments

Comments

@rsignell
Copy link

In the example below I was expecting that time would get encoded because inline_threshold=400 and time is only 8 bytes long.

Below we see that depth is encoded (it's 332 bytes long), but time is not.

Is this expected behavior?

image

@martindurant
Copy link
Member

It seems inline is only called for normal, non-record arrays. Another thing to fix! Obviously, not too much use of netCDF3 has been seen.

@rsignell
Copy link
Author

Well, it's not a high priority for me -- if truth be told, I was really just trying to figure out how to inject a known time value into that reference.

@martindurant
Copy link
Member

You can replace the value with a binary, if you want. But also, there already is a function that does exactly this process for any reference set, so it just needs to be invoked.

@kmsampson
Copy link

You can replace the value with a binary, if you want. But also, there already is a function that does exactly this process for any reference set, so it just needs to be invoked.

Can you point to the function that can inject a known time value or how to replace the value with a binary?

@martindurant
Copy link
Member

This may be fixed in #466 , if you would care to try.

@kmsampson , the spec says:

the str format of a reference value may be:
a string starting “base64:”, which will be decoded to binary
any other string, interpreted as ascii data

so set the key's value in the JSON accordingly. If still in memory, you can also directly assign the binary you want it to have. You could also use the filesystem interface, if you already made a filesystem, fs.pipe("time/0", b"\x00\x00..."); this modification can be outputted again with fs.save_json, or a .flush on the parquet/lazy storage, if you are using that. Too many options?

@rsignell
Copy link
Author

Yes, this is fixed in #466:
image

I still can't figure out how to assign a specific value though:
image
(perhaps I should ask this in discussions?)

@martindurant
Copy link
Member

I would do

d["refs"]["time/0"] = data_bytes

and kerchunk.utils._encode_for_JSON or consolidate can do the encoding for you.

You can also make a filesystem and interact with it

fs = fsspec.filesystem("reference", fo=d, ...)
fs.cat("time/0", data_bytes)
fs.save_json(filename) OR grab fs.references

@rsignell
Copy link
Author

I'm feeling kind of dumb here, but I still don't get it:
image

@martindurant
Copy link
Member

Oh sorry, the function works on the inner reference dict, d["refs"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants