rekx
seamlessly interfaces
the Kerchunk library [@Durant2023]
in an interactive way through the command line.
It assists in creating virtual aggregate datasets,
also known as Kerchunk reference sets,
which allows for an efficient, parallel and cloud-friendly way
to access data in-situ without duplicating the original datasets.
More than a functional tool,
rekx
serves an educational purpose on matters around
chunking, compression and efficient data reading
from common scientific file formats such as NetCDF
used extensively to store large time-series.
While there is abundant documentation on such topics,
it is often highly technical
and oriented towards developers,
rekx
tries to simplify these concepts through practical examples.
Similarly,
existing tools for managing HDF and NetCDF data,
such as cdo
, nco
, and others,
often have overlapping functionalities
and present a steep learning curve for non-experts.
rekx
focuses on practical aspects of efficient data access
trying to simplify these processes.
It features simple command line tools to:
- diagnose data structures
- validate uniform chunking across files
- suggest good chunking shapes
- parameterise the rechunking of datasets.
- create and aggregate Kerchunk reference sets
- time data read operations for performance analysis
rekx
dedicates to practicality, simplicity, and essence.
Interested ? Head over to the documentation.
- Complete backend for rechunking, support for
- NetCDF4
- Xarray
-
nccopy
- Simplify command line interface
- merge "multi" commands to single/simple ones ?
- make
common-shape
andvalidate
options toshapes
? - clean non-sense
suggest-alternative
command or merge tosuggest
- merge
reference-parquet
toreference
- as above, same for/with
combine
commands - does a sepatate
select-fast
make sense ? - review various select/read commands
- Go through :
- Write clean and meaningful docstrings for each and every function
- Pytest each and every (?) function
- Packaging
- Documentation
- Use https://squidfunk.github.io/mkdocs-material/
- Simple examples
- Diagnose
- Suggest
- Rechunk
- Kerchunk
- JSON
- Create references
- Combine references
- Read data from aggregated reference and load in memory
- Parquet
- Create references
- Combine references
- Read data from aggregated reference and load in memory
- Pending issue fsspec/kerchunk#345 (comment)
- JSON
- Select (aka read)
- From Xarray-supported datasets
- From Kerchunk references
- Tutorial
- Rechunking and Kerchunking SARAH3 products
- Add visuals to Concepts
Footnotes
-
Original T-Rex drawn by pikisuperstar on Freepik ↩