Manage large vectors of bits types in Julia. A thin wrapper for mmapped binary data, with a few sanity checks and convenience functions.
For each dataset, the columns (vectors of equal length) and metadata are stored in a directory like this:
dir/
├── layout.jld2
├── meta/
│ └ ...
├── 1.bin
├── 2.bin
├── ...
├── ...
└── ...
The file layout.jld2
specifies the number and types of columns (using
JLD2.jl, and the total number of
elements. The $i.bin
files contain the data for each column, which
can be memory mapped.
Additional metadata can be saved as in files in the directory
meta
. This is ignored by this library; use the function meta_path
to calculate paths relative to dir/meta
.
Two interfaces are provided. Use SinkColumns
for an ex ante
unknown number of elements, written sequentially. This is useful for
ingesting data.
MmappedColumns
is useful when the number of records is known and
fixed.
Types for the columns are specified as Tuple
s. See the docstrings
for both interfaces and the unit tests for examples.
Work on this library was supported by the Austrian National Bank Jubiläumsfonds grant #17378.