Three simple things this library does to you:
- Guess the media type from raw bytes,
- Parse its' dimensions, sizes, lengths, etc.,
- Unpack data into regular preallocated Tensors.
Where do we use it? To connect the Data Storage layer of UKV to High-Performance Computing libraries like TensorFlow and PyTorch.
Most common file-formats have "signatures" or "magic numbers" embedded into them. Often, as the prefix of the byte-stream.
Libraries implementing the first step have been implemented for other languages:
- filetype for GoLang
- filetype.py for Python
- FileType for Elixir
- FileSignatures for C#
- Pillow and Pillow-SIMD for image formats.
- FFmpeg, for video formats.
- Nyquist, for audio formats.
In fact, TenPack is just a CMake-friendly generalization of those libraries with a C interface and focus on memory reusing.