Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to the DMR++ parser #230

Open
TomNicholas opened this issue Aug 26, 2024 · 4 comments
Open

Improvements to the DMR++ parser #230

TomNicholas opened this issue Aug 26, 2024 · 4 comments
Labels
documentation Improvements or additions to documentation references generation Reading byte ranges from archival files

Comments

@TomNicholas
Copy link
Member

The DMR++ parser was merged in #133, but there are a few ways it could be improved.

  1. Docs. It's not actually listed anywhere publicly that DMR++ files are supported, not even in the docstring of open_virtual_dataset.
  2. HDF4 support (Support HDF4? #216)
  3. Use ChunkManifest.from_arrays, which should increase performance and will reduce reliance on the kerchunk in-memory format (open_virtual_dataset with dmr++ #113 (comment))
  4. Internal code improvements, e.g.:
    a. Use pathlib module instead of os internally
    b. Refactor to be more functional, see open_virtual_dataset with dmr++ #113 (comment)

cc @ayushnag @betolink

@TomNicholas TomNicholas added documentation Improvements or additions to documentation references generation Reading byte ranges from archival files labels Aug 26, 2024
@Mikejmnez
Copy link

I would like to be involved in some of this work. I can definitely work to better understand the complexities of HDF4 and the steps to enable support to HDF4.

@TomNicholas
Copy link
Member Author

@ayushnag is there a way to identify a DMR++ file automatically? e.g. a file magic?

@ayushnag
Copy link
Contributor

Not to my knowledge. All valid XML files must start with the string "<?xml" however beyond that I think there would need to be some reading of the header tags (e.g. xmlns:dmrpp="http://xml.opendap.org/dap/dmrpp/1.0.0#") to know it is a dmrpp file.

cc @Mikejmnez @jgallagher59701

@Mikejmnez
Copy link

@ayushnag is right. The first four elements are not be enough to discern between a generic xml from a dmrpp-generated xml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation references generation Reading byte ranges from archival files
Projects
None yet
Development

No branches or pull requests

3 participants