Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Fix COW allocates too much memory #91286

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Nazarwadim
Copy link
Contributor

@Nazarwadim Nazarwadim commented Apr 28, 2024

I found a place where COW allocates too much memory for types whose size is not a multiple of 2.
For example, we have a type size of 20 bytes and want to change the COW size to 4. To do this, call the resize(4) method.
This method calls the _get_alloc_size method which allocates too much memory for us. _next_po2(20 * 4) = 128.
But instead it should be _next_po2(4) * 20 = 80.

I haven't done performance benchmarks yet, but only looked at the amount of memory used by the project from CI https://github.com/godotengine/regression-test-project/archive/4.0.zip.

Before:
image

After:
image

@Nazarwadim Nazarwadim requested a review from a team as a code owner April 28, 2024 19:31
@AThousandShips
Copy link
Member

AThousandShips commented Apr 28, 2024

_next_po2(20 * 4) = 124

Did you mean 128?

I'm not sure about this, the power of two is expected to be on the allocated memory, otherwise it doesn't matter, it's relevant that the allocation is a power of two

It doesn't make any sense to me to allocate for a power of two number of elements, so either it should be a power of two bytes, or nothing at all

@Calinou Calinou added the bug label Apr 28, 2024
@Nazarwadim
Copy link
Contributor Author

There is a lot of memory that is simply not used. Like, why allocate 32 bytes when you can allocate 20. For 2 elements, it will be 48, although there is 40 real data there.

@AThousandShips
Copy link
Member

Well it depends on what we're trying to achieve here, often allocating a power of two memory is desirable to the OS, as far as I know, I don't think the use of this particular formula here was in any way an oversight by the implementer but desired

Copy link
Member

@AThousandShips AThousandShips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above issues with _get_alloc_size_checked need to be resolved, needs to either be reverted or confirmed that it does not reintroduce:

The other details are for discussion but this part is breaking and reverting important safety checks

@Nazarwadim
Copy link
Contributor Author

I checked the project, from issue. Godot didn't crash, but my computer did. I don't know if this is the right behavior.

@AThousandShips
Copy link
Member

Did you try with the check restored?

@Nazarwadim
Copy link
Contributor Author

I opened the file godot_4_alpha_12_tileset_issue_2.log and in a few seconds the computer ran out of memory. The same thing happens on the master branch.

@AThousandShips
Copy link
Member

The issue isn't fully solved for that side, but this check should probably not be removed, it's not unnecessary, so unless you can confirm it's unnecessary it should be restored IMO, the zero check can be removed but not the return *out; part

@Nazarwadim
Copy link
Contributor Author

Nazarwadim commented Apr 28, 2024

If you change from return true; to return *out; and remove check, it won't work, because when p_elements = 0, false will be returned.

@Nazarwadim
Copy link
Contributor Author

Nazarwadim commented Apr 28, 2024

And this zero check with return *out; is unnecessary due to this.
ERR_FAIL_COND_V(!_get_alloc_size_checked(p_size, &alloc_size), ERR_OUT_OF_MEMORY);
If we leave it as it was, then the compiler does two checks. The first one is whether x == 0. And the second one is this one, since it cannot know that *out will definitely be 0. But if I return true, the compiler will remove this check, which, by the way, reduces the size of the binary and improves performance a little.
To be more precise, the size of a mono release before 77 878 576 and after 77 845 808.

I also took a closer look at your PR and noticed that before COW was 32-bit. This explains the overflow.

_next_po2 method body.

if (sizeof(USize) == 8) {
	x |= x >> 32;
}

This fixed the problem.

@timothyqiu
Copy link
Member

Like, why allocate 32 bytes when you can allocate 20. For 2 elements, it will be 48, although there is 40 real data there.

See this explaination by Juan: https://youtu.be/BvpBEIe86CE?t=1630

@Nazarwadim
Copy link
Contributor Author

Nazarwadim commented Apr 29, 2024

I just realized today that we don't allocate 2^n bytes. We allocate 2^n + 16 and + 16 if debug.

uint8_t *mem_new = (uint8_t *)Memory::alloc_static(alloc_size + DATA_OFFSET, false);

These are the expansion ranges before: 48 80 144 272 528 1040.
After: 36 56 96 176 336 656.

This shows that the allocation is in no way similar to 2^n.

See this explaination by Juan: https://youtu.be/BvpBEIe86CE?t=1630

From the video, I understand the doubling of the size, not that it should be strictly 2^n.

And also in other implementations of the vector, for example, std::vector, or List from C#, the expansion occurs by multiplying capacity by 2.

@bruvzg
Copy link
Member

bruvzg commented Apr 29, 2024

_next_po2(20 * 4) = 128. But instead it should be _next_po2(4) * 20 = 80.

What's the real memory usage difference, not the one shown by Godot? Underlying malloc will never allocate exact size, it's either picking up free slot from pre-allocated small size tables (usually few powers of 2) or allocating multiple of 4K pages.

@AThousandShips AThousandShips dismissed their stale review April 29, 2024 06:31

I've done my due diligence here, I'm not convinced this won't reintroduce the crash, but if you're convinced it'll be safe and other reviewers agree I'll wash my hands of it

@Nazarwadim
Copy link
Contributor Author

What's the real memory usage difference.

My 173.3 MB
Master 173.5 MB.

It looks like a much smaller difference.

I'm thinking about it now. For example, we have 1024 bytes of data and another 16 bytes of metadata(ref_couter + size). And as I understand it, malloc will allocate 2048 bytes. And when we need to expand the size to 2048, then malloc will allocate 4096. Although we only need 2048.
My idea is to adjust the size of alloc_size + DATA_OFFSET to 2^n. Most likely, this will require the following formula next_po2(p_elements * sizeof(T)) - DATA_OFFSET. I would like to hear your thoughts on this.

@bruvzg
Copy link
Member

bruvzg commented Apr 29, 2024

This makes sense to take metadata into account.

For small allocs it actually might have more options than powers of 2, like https://git.musl-libc.org/cgit/musl/tree/src/malloc/mallocng/malloc.c#n12 but it's platform/CRT dependent, so I would not relay on any specific values.

@Nazarwadim
Copy link
Contributor Author

Then do I need to do it like this so that both small and large data types take up less space?

next_po2(p_elements) * sizeof(T) - DATA_OFFSET

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants