-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Compression and Splines #609
Comments
Reading through the CSM standard, I don't see any mention of the file format. I haven't read the whole thing but here is a part of the text (CSM 3.1 TRD page 22) on ISDs:
Page 63 continues to talk about "Filename ISD" support but says nothing about its format. Considering file reading isn't a requirement, we might be able to get away with a novel ISD format. I wasn't involved in early convos for the ISD format and why it's JSON. But I imagine that is not enforced in the standard and we just chose one? That is to say, if we wanted to create a second compressed ISD format, it seems we could. |
On the topics:
So I think the way to do this is using a memory mapped file, as that would give use the fastest reading time. There exists libraries out there that can handle all the nuances with binary compatibility across OSes and architectures. I had success in SpiceQL with this to reduce kernel loading query from 20,000ms (straight JSON) to 5ms (MMAPed tables). Downside is you don't get compression, but if there exists a C++ interface to Brotli (or whatever) compression that allows us to decompress bytes in memory to avoid extra copies we could read in straight bytes and decompress to something in memory. Theoretically, I think a novel file format that is compressed bytes -> mmaped on IO into a bytes array -> decompressed in memory, could all still be faster than straight reading of a large mmaped file 🤔 This all hinges on off the shelf libraries that supports decompressing bytes. Edit: potential options? https://github.com/NewYaroslav/brotli-hpp and https://github.com/vimpunk/mio
Whatever format we use above would have to unpack to something other than JSON (e.g. some kind of efficient hash map that is not the STL library's just because it's notoriously slow for what it is, header only implementations are out there), and maybe hide the implementation under a basic object that others could use that allows updates. Then expose that in python. |
There is no standard ISD format. This was done on purpose to not limit the camera type or metadata needed. That said, most Earth-centric implementations, combine the ISD and image pixels into the National Imagery Transmission Format (NITF). While the NITF was researched for the usgscsm library and planetary data, it was quickly discovered that most applications which supported the NTIF format assume an Earth-based WGS84 reference ellipsoid (and thus not used for our planetary use case). For the Earth-side and NITF, a little more information is here (including RPC support in NITF). |
@thareUSGS I saw how the standard seemed to suggest preferring the NITF format over others, but it didn't seem to be something we would be supporting. |
With #604 and #605 compression is being added for ISDs. This issue is to discuss how this might propagate across the stack.
Here are some nominal requirements:
The text was updated successfully, but these errors were encountered: