Discussion: Compression and Splines #609

jlaura · 2024-05-31T17:10:06Z

With #604 and #605 compression is being added for ISDs. This issue is to discuss how this might propagate across the stack.

Here are some nominal requirements:

Ability to compress/decompress ISDs (done in Enhancement: Support for ISD Compression #604 and Fix disabled tests #605).
Ability to use the compressed ISD straight to memory (no need to decompress to disk).
Ability to write an updated compressed ISD to disk after bundle adjustment.
Ability to perform adjustment tasks at scale using the CSM. Can one load 1000 ISDs efficiently? 10,000? 100,000?
- This potentially relates to storing less ephemeris points inside the ISD.
An API for consumes of ale to use in their library for working with compressed ISDs.

Kelvinrr · 2024-06-04T17:29:02Z

Reading through the CSM standard, I don't see any mention of the file format. I haven't read the whole thing but here is a part of the text (CSM 3.1 TRD page 22) on ISDs:

Providing image support data to the sensor model selection and construction
functions. An Isd class object is provided when processing an image from native file
format (or when a sensor model state is not available).
Note that the following convention should be observed by the Application when
constructing Isd objects. The Application should create ISD standard forms such as
NITF 2.0 or 2.1, if possible. The next preferred form is BYTESTREAM, followed by
FILENAME. Some plug-ins may not support file access operations.

Page 63 continues to talk about "Filename ISD" support but says nothing about its format.

Considering file reading isn't a requirement, we might be able to get away with a novel ISD format. I wasn't involved in early convos for the ISD format and why it's JSON. But I imagine that is not enforced in the standard and we just chose one? That is to say, if we wanted to create a second compressed ISD format, it seems we could.

Kelvinrr · 2024-06-04T17:49:24Z

On the topics:

Ability to use the compressed ISD straight to memory (no need to decompress to disk).

So I think the way to do this is using a memory mapped file, as that would give use the fastest reading time. There exists libraries out there that can handle all the nuances with binary compatibility across OSes and architectures. I had success in SpiceQL with this to reduce kernel loading query from 20,000ms (straight JSON) to 5ms (MMAPed tables). Downside is you don't get compression, but if there exists a C++ interface to Brotli (or whatever) compression that allows us to decompress bytes in memory to avoid extra copies we could read in straight bytes and decompress to something in memory.

Theoretically, I think a novel file format that is compressed bytes -> mmaped on IO into a bytes array -> decompressed in memory, could all still be faster than straight reading of a large mmaped file 🤔 This all hinges on off the shelf libraries that supports decompressing bytes. Edit: potential options? https://github.com/NewYaroslav/brotli-hpp and https://github.com/vimpunk/mio

Ability to write an updated compressed ISD to disk after bundle adjustment.
An API for consumes of ale to use in their library for working with compressed ISDs.

Whatever format we use above would have to unpack to something other than JSON (e.g. some kind of efficient hash map that is not the STL library's just because it's notoriously slow for what it is, header only implementations are out there), and maybe hide the implementation under a basic object that others could use that allows updates. Then expose that in python.

thareUSGS · 2024-06-04T19:23:19Z

Page 63 continues to talk about "Filename ISD" support but says nothing about its format.

There is no standard ISD format. This was done on purpose to not limit the camera type or metadata needed. That said, most Earth-centric implementations, combine the ISD and image pixels into the National Imagery Transmission Format (NITF). While the NITF was researched for the usgscsm library and planetary data, it was quickly discovered that most applications which supported the NTIF format assume an Earth-based WGS84 reference ellipsoid (and thus not used for our planetary use case). For the Earth-side and NITF, a little more information is here (including RPC support in NITF).

Kelvinrr · 2024-06-04T20:52:18Z

@thareUSGS I saw how the standard seemed to suggest preferring the NITF format over others, but it didn't seem to be something we would be supporting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Compression and Splines #609

Discussion: Compression and Splines #609

jlaura commented May 31, 2024

Kelvinrr commented Jun 4, 2024 •

edited

Loading

Kelvinrr commented Jun 4, 2024 •

edited

Loading

thareUSGS commented Jun 4, 2024 •

edited

Loading

Kelvinrr commented Jun 4, 2024

Discussion: Compression and Splines #609

Discussion: Compression and Splines #609

Comments

jlaura commented May 31, 2024

Kelvinrr commented Jun 4, 2024 • edited Loading

Kelvinrr commented Jun 4, 2024 • edited Loading

thareUSGS commented Jun 4, 2024 • edited Loading

Kelvinrr commented Jun 4, 2024

Kelvinrr commented Jun 4, 2024 •

edited

Loading

Kelvinrr commented Jun 4, 2024 •

edited

Loading

thareUSGS commented Jun 4, 2024 •

edited

Loading