"readers" or "parsers" or what should we call them? #239

TomNicholas · 2024-09-13T00:20:26Z

I'm currently calling the functions that extract the metadata from a specific filetype (or many filetypes in kerchunk's case) "readers".

I feel like "reader" implies that they read actual array data into memory, but it doesn't do that. It's also confusing alongside Zarr "readers", which actually do read array data into memory

But is there a less overloaded term? e.g:

"parsers" - because they usually involve scanning a file and "parsing" out the metadata to be separate from the array data
"scanner"?
"inspector"?
"extractor"?

As of #231 these currently all live in a module called virtualizarr.readers.

We also have virtualizarr.writers, but that's less of a problem given that (a) there will probably only ever be two of these, (b) there aren't existing xarray backends you could get this confused with, and (c) you are writing metadata, it's not a bad term for it.

(Related to pydata/xarray#9491, thought of this whilst proposing the API in #238)

The text was updated successfully, but these errors were encountered:

douglatornell · 2024-09-17T17:28:47Z

What about "get_metadata"? That explicitly says what the function does, and it can live comfortably beside functions that read array data.

TomNicholas · 2024-09-20T19:07:14Z

@douglatornell that certainly clearly describes what they do! I just don't know what the corresponding noun would be, vz.open_virtual_dataset(... metadata_getter=...) hardly rolls off the tongue 😅

douglatornell · 2024-09-29T18:18:16Z

Perhaps I'm not understanding the nuances of the the API.

In vz.open_virtual_dataset(... metadata_getter=...), the metadata_getter=... bit would be an item in reader_options, correct? So, why not make it

vz.open_virtual_dataset(... metadata=vz.readers.get_kerchunk_metadata, ...)

for example?

But do you need to pass a function name at that level? Why not

vz.open_virtual_dataset(... metadata="kerchunk", ...)

and do the dispatching to the appropriate vz.readers.get_*_metadata() function in vz.open_virtual_dataset() where you are handling so many of the other details of different storage formats. I suppose doing that would preclude a user-defined get_*_metadata() function, if that is a feature you want/need to support.

TomNicholas · 2024-09-30T15:44:58Z

the metadata_getter=... bit would be an item in reader_options, correct?

Not quite - reader_options is passed on to kerchunk readers, but there are other readers that are not based on kerchunk (e.g. the dmr++ reader).

a user-defined get_*_metadata() function

I do think we want to support that. There are a few common file formats that are parseable to the zarr model, plus a long tail of niche formats that are also parseable to the zarr model (see #218). We should ship readers for the common ones bundled here, but allows users to plug their own readers in.

But do you need to pass a function name at that level?

In general I feel like we should be trying to move towards an entrypoint system, analogous to xarray.open_dataset's BackendEntrypoint class, which is what is accessed when you use the engine=... kwarg to xr.open_dataset.

TomNicholas added the documentation Improvements or additions to documentation label Sep 13, 2024

TomNicholas mentioned this issue Sep 25, 2024

A better name for the ".virtualize" accessor? #241

Open

TomNicholas mentioned this issue Oct 3, 2024

Make readers pluggable via entrypoint system #245

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"readers" or "parsers" or what should we call them? #239

"readers" or "parsers" or what should we call them? #239

TomNicholas commented Sep 13, 2024 •

edited

Loading

douglatornell commented Sep 17, 2024

TomNicholas commented Sep 20, 2024

douglatornell commented Sep 29, 2024

TomNicholas commented Sep 30, 2024

"readers" or "parsers" or what should we call them? #239

"readers" or "parsers" or what should we call them? #239

Comments

TomNicholas commented Sep 13, 2024 • edited Loading

douglatornell commented Sep 17, 2024

TomNicholas commented Sep 20, 2024

douglatornell commented Sep 29, 2024

TomNicholas commented Sep 30, 2024

TomNicholas commented Sep 13, 2024 •

edited

Loading