-
-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save PackedByteArrays as base64 encoded #89186
Save PackedByteArrays as base64 encoded #89186
Conversation
This breaks compatibility with the format right? Data generated with this can't be loaded in older versions, should we bump the text file version? |
Ah yes indeed. Older version cannot read it. If there's a version to increase somewhere I can do it I guess. |
It's right here:
|
d3b15e6
to
162f546
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally, it works as expected. This has been a longstanding issue, great work 🙂
Testing project: test_packed_byte_array_scenes.zip1
.tscn
file with 10,000,000 PackedByteArray elements (5 million are zeros, 5 million are random with a fixed seed):
Benchmark
OS: Fedora 39 x86_64
CPU: Intel Core i9-13900K (performance
governor)
SSD: Solidigm P44 Pro 2 TB
Saving time
Pressing Ctrl + S with only one scene open in the editor and measuring the time the progress dialog is visible with a stopwatch.
Binary (for reference) | Before | After (this PR) |
---|---|---|
0.4 seconds | 3.2 seconds | 1.1 seconds |
Loading time
Running the project and measuring the time spent on the splash screen with a stopwatch.
Binary (for reference) | Before | After (this PR) |
---|---|---|
0.5 seconds | 2.5 seconds | 1.4 seconds |
File size (on filesystem)
Binary (for reference) | Before | After (this PR) |
---|---|---|
4.8 MB2 | 36.4 MB | 12.7 MB |
File size (within a ZIP archive)
Binary (for reference) | Before | After (this PR) |
---|---|---|
4.8 MB2 | 6.4 MB | 4.8 MB |
Footnotes
While the PR is fine as is, if the scene format needs to be upgraded, it would be nice to do more now than later.
|
while (true) { | ||
if (token.type != TK_NUMBER) { | ||
bool valid = false; | ||
if (token.type == TK_IDENTIFIER) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we could do this in a way there is no duplicated code..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we can easily do it. The _parse_construct
does handle the parenthesis and the elements in the coma-separated values.
We could extract the coma-separated values into a dedicated function and maybe check the parenthesis in another one, but like, the problem is knowing whether or not we have a string vs a first number to decide what to do afterwards. This makes it a bit complicated as get_token consumes to token (so checking for a number means we have to consume the number), so it's hard to check things beforehands.
If we want simpler solution, I can try, at least, to add a _parse_number_or_identifier()
or something ? So that at least the number checking can be avoided as duplicated code ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At worst you can add a FIXME/TODO comment.
Both the binary format version and the text format version should be incremented for this. |
Why the binary format? This doesn't affect it |
162f546
to
de50735
Compare
Seems like a huge performance buff. Are there any drawbacks or other changes from user side? |
I don't think I want to change the format of other arrays for now. For several reasons:
|
The main drawback is that those are not editable in the text file, with a text editor (which is kind of the whole point of text files as resources). Also it a bit less VCS-friendly. |
My suggestion from RocketChat still stands that both formats should be supported, with Base64 used when the Packed Array is deemed too large (arbitrary, I know). |
What would be the benefit? And what would be the threshold for "too large"? |
Well, that would depend on how precise you want to go about it. The easy way out is, say, assume that any Packed Array with more than 32 elements is entirely incomprehensible even in text form and should be encoded anyway (depends on the array type). The benefit is that users that actually edit PackedByteArray & Co. inside the
|
Once again, the main problem with that approach is that base64 is not necessarily efficient at storing those types. So if your packed mostly int array consists or numbers, let's say, inferior to On the other hand, packed bytes are mostly used to store raw binary data, which will most likely use the full set of possible values. That's why there's little risk in using base64 here, as we can assume it will be efficient most of the time. To me, there's no reason to base64-encode other arrays for now. It's complicated and is not ensured to be working anyway. I don't mind if someone wants to try it later on, want to make it optional, or maybe find ways to determine if an array should rather be base64-encode or not, but this is out of the scope of this PR. I just wanted to solve this for TileMaps for now, which this PR does fine. |
@groud I believe what Mickeon suggested was not to encode other types of arrays as base64, but to have a threshold for which PackedByteArrays should be base64 encoded. E.g. only when the PackedByteArray has more than 10 elements or whatever would be a good threshold for a tradeoff between readable and human-editable encoding, and efficiency. |
Well, I don't mind doing so only for PackedByteArrays. Not sure if it it worth the added code but it's not too much work. As you want. |
Personally I'm not sure it's worth adding some complexity. But it could make sense to have a look at some examples of small PackedByteArrays to see how they end up with base64 encoded, so we can see if indeed having a threshold is worth it. If the base64 representation of small arrays is bigger than the current representation, it can indeed make sense to avoid. And I guess for small, hand-written arrays (<10 or 20 elements), it could make sense to keep things VCS friendly. But for anything that's just a raw buffer dump, base64 should be the way to go. But again, let's have a look at some concrete examples if someone wants to put them together, so we can decide. |
Might even be worth having a hint for this, but unsure if the complexity is desired, but that would allow us to mark properties that aren't designed to be human readable to be compressed, and others not, could be applied for all the packed arrays |
I'd say not much. If you have one or two small arrays, it doesn't matter if they get a few characters longer. It would only make some real difference if you have lots of small byte arrays, but it sounds like unlikely use-case, idk. It would be great if this gets merged for the same release as #89179, because upgrading the TileMaps and then upgrading the arrays would create lots of VCS noise; better to do it in one go. |
I concur. My opinion should not be a blocker for this. It's better than not doing compression at all. |
btw the new TileMapLayer format saved in Base64 is shorter than the previous EDIT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems ok for now. From our discussion with @Faless all this code will at some point in the future will need to be isolated to make it auditable for security issues (given it can be used as network I/O or savefiles), meaning in that case it will have to change anyway.
Thanks! |
It's too late, but I wonder if format change was necessary. It makes all scenes incompatible with older versions, while the change itself would only affect scenes with PackedByteArray (which I think are super rare outside TileMaps). While it doesn't really affect regular users, it's going to be annoying to test projects in multiple engine versions. Maybe we could make Godot ignore scene format in dev builds, idk 🤔 |
Welcome to my world. Dealing with multiple format versions is a pain in the neck and I end up having to create parsers for each version. If the text resource loaders/parsers were at least version-aware and could deal with parsing older versions of the text/binary formats, this wouldn't be a problem. |
On the Godot contributors chat, we discussed a workaround in which the scene/resource format would be upgraded only if at least one PackedByteArray was serialized. EditorSettings generally doesn't contain any PackedByteArray, so this would allow it to keep working regardless of version. |
Currently using 4.2 as a person who use separate App Network byte sending to byte_to_var from godot UDP or TCP Server, will i be affected by this? there is for example: udp:
tcp:
so now im unsure how i will need encode this on separate App side? Imo you should just add PackedByteArray and EncodedPackedByteArray |
No, this does not impact binary serialization, only the text one (mainly for |
thanks for info, but are you sure? there is definetly usage in UDP:
will this work exactly same when i send same byte format from separate non-godot App? If this is really just for files, then im happy. |
Yes I am. The modified |
That saves a lot of space in text resource files for properties that will likely never require to be modified manually.
This of course keeps compatibility.
This is quite useful TileMaps (see #89179 )
Edit: I made a comparison with my the TileMapLayer PR, this is what I get:
Peek.05-03-2024.18-09.mp4
The character count is likely divided by ~2. But most of the TileMapLayer data contains single-digit values, which is only 1 char. With byte values above 10 (or worse, 100), the gain would be even better.