Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Compression and Splines #609

Open
6 tasks
jlaura opened this issue May 31, 2024 · 4 comments
Open
6 tasks

Discussion: Compression and Splines #609

jlaura opened this issue May 31, 2024 · 4 comments

Comments

@jlaura
Copy link
Collaborator

jlaura commented May 31, 2024

With #604 and #605 compression is being added for ISDs. This issue is to discuss how this might propagate across the stack.

Here are some nominal requirements:

  • Ability to compress/decompress ISDs (done in Enhancement: Support for ISD Compression #604 and Fix disabled tests #605).
  • Ability to use the compressed ISD straight to memory (no need to decompress to disk).
  • Ability to write an updated compressed ISD to disk after bundle adjustment.
  • Ability to perform adjustment tasks at scale using the CSM. Can one load 1000 ISDs efficiently? 10,000? 100,000?
    • This potentially relates to storing less ephemeris points inside the ISD.
  • An API for consumes of ale to use in their library for working with compressed ISDs.
@Kelvinrr
Copy link
Collaborator

Kelvinrr commented Jun 4, 2024

Reading through the CSM standard, I don't see any mention of the file format. I haven't read the whole thing but here is a part of the text (CSM 3.1 TRD page 22) on ISDs:

Providing image support data to the sensor model selection and construction
functions. An Isd class object is provided when processing an image from native file
format (or when a sensor model state is not available).
Note that the following convention should be observed by the Application when
constructing Isd objects. The Application should create ISD standard forms such as
NITF 2.0 or 2.1, if possible. The next preferred form is BYTESTREAM, followed by
FILENAME. Some plug-ins may not support file access operations.

Page 63 continues to talk about "Filename ISD" support but says nothing about its format.

Considering file reading isn't a requirement, we might be able to get away with a novel ISD format. I wasn't involved in early convos for the ISD format and why it's JSON. But I imagine that is not enforced in the standard and we just chose one? That is to say, if we wanted to create a second compressed ISD format, it seems we could.

@Kelvinrr
Copy link
Collaborator

Kelvinrr commented Jun 4, 2024

On the topics:

Ability to use the compressed ISD straight to memory (no need to decompress to disk).

So I think the way to do this is using a memory mapped file, as that would give use the fastest reading time. There exists libraries out there that can handle all the nuances with binary compatibility across OSes and architectures. I had success in SpiceQL with this to reduce kernel loading query from 20,000ms (straight JSON) to 5ms (MMAPed tables). Downside is you don't get compression, but if there exists a C++ interface to Brotli (or whatever) compression that allows us to decompress bytes in memory to avoid extra copies we could read in straight bytes and decompress to something in memory.

Theoretically, I think a novel file format that is compressed bytes -> mmaped on IO into a bytes array -> decompressed in memory, could all still be faster than straight reading of a large mmaped file 🤔 This all hinges on off the shelf libraries that supports decompressing bytes. Edit: potential options? https://github.com/NewYaroslav/brotli-hpp and https://github.com/vimpunk/mio

Ability to write an updated compressed ISD to disk after bundle adjustment.
An API for consumes of ale to use in their library for working with compressed ISDs.

Whatever format we use above would have to unpack to something other than JSON (e.g. some kind of efficient hash map that is not the STL library's just because it's notoriously slow for what it is, header only implementations are out there), and maybe hide the implementation under a basic object that others could use that allows updates. Then expose that in python.

@thareUSGS
Copy link
Contributor

thareUSGS commented Jun 4, 2024

Page 63 continues to talk about "Filename ISD" support but says nothing about its format.

There is no standard ISD format. This was done on purpose to not limit the camera type or metadata needed. That said, most Earth-centric implementations, combine the ISD and image pixels into the National Imagery Transmission Format (NITF). While the NITF was researched for the usgscsm library and planetary data, it was quickly discovered that most applications which supported the NTIF format assume an Earth-based WGS84 reference ellipsoid (and thus not used for our planetary use case). For the Earth-side and NITF, a little more information is here (including RPC support in NITF).

@Kelvinrr
Copy link
Collaborator

Kelvinrr commented Jun 4, 2024

@thareUSGS I saw how the standard seemed to suggest preferring the NITF format over others, but it didn't seem to be something we would be supporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants