Skip to content

Handle large columns (vectors of equal length) with bits types in Julia using mmap.

License

Notifications You must be signed in to change notification settings

tpapp/LargeColumns.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LargeColumns

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. Build Status Coverage Status codecov.io

Manage large vectors of bits types in Julia. A thin wrapper for mmapped binary data, with a few sanity checks and convenience functions.

Specification

For each dataset, the columns (vectors of equal length) and metadata are stored in a directory like this:

dir/
├── layout.jld2
├── meta/
│   └ ...
├── 1.bin
├── 2.bin
├── ...
├── ...
└── ...

The file layout.jld2 specifies the number and types of columns (using JLD2.jl, and the total number of elements. The $i.bin files contain the data for each column, which can be memory mapped.

Additional metadata can be saved as in files in the directory meta. This is ignored by this library; use the function meta_path to calculate paths relative to dir/meta.

Interfaces

Two interfaces are provided. Use SinkColumns for an ex ante unknown number of elements, written sequentially. This is useful for ingesting data.

MmappedColumns is useful when the number of records is known and fixed.

Types for the columns are specified as Tuples. See the docstrings for both interfaces and the unit tests for examples.

Acknowledgments

Work on this library was supported by the Austrian National Bank Jubiläumsfonds grant #17378.

About

Handle large columns (vectors of equal length) with bits types in Julia using mmap.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages