Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZSTD_compressBound can silently overflow #3323

Closed
nigeltao opened this issue Nov 23, 2022 · 1 comment · Fixed by #3362
Closed

ZSTD_compressBound can silently overflow #3323

nigeltao opened this issue Nov 23, 2022 · 1 comment · Fixed by #3362
Assignees

Comments

@nigeltao
Copy link

nigeltao commented Nov 23, 2022

The size_t ZSTD_compressBound(size_t srcSize) function is equivalent to the ZSTD_COMPRESSBOUND macro and it can silently overflow, as seen by compiling this program with gcc -m32:

#include <stdint.h>
#include <stdio.h>

#define ZSTD_COMPRESSBOUND(srcSize) \
  ((srcSize) + ((srcSize) >> 8) + \
  (((srcSize) < (128 << 10)) ? (((128 << 10) - (srcSize)) >> 11) : 0))

int main(int, char**) {
  printf("sizeof(size_t)=%zu\n", sizeof(size_t));
  printf("good:     0x%08zx\n", ZSTD_COMPRESSBOUND(0xff00ff00));
  printf("overflow: 0x%08zx\n", ZSTD_COMPRESSBOUND(0xff00ff01));
  return 0;
}

Output (per https://godbolt.org/z/W5febq6x3):

sizeof(size_t)=4
good:     0xffffffff
overflow: 0x00000000

The severity is probably very low, due to the relative unlikeliness of both (1) a 32 bit system and (2) a large (4GB) input. But given that dstCapacity > ZSTD_compressBound(srcSize) enables fast paths (that presumably eschew bounds checking), it would make auditing for memory safety easier if the zstd.h header file that declares ZSTD_compressBound, in its commentary, discuss how that function can 'fail' (due to overflow) and how callers can detect that.

A similar point probably applies to ZSTD_decompressBound although that returns unsigned long long, not size_t, so IIUC should not overflow for a 4-ish GB srcSize.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Nov 23, 2022

Hi @nigeltao ,

Thanks for your report. You are right, the macro can be made to overflow. It's not going to result in safety issues though.
The main objective of ZSTD_compressBound() is to inform the user about a minimal output buffer size which guarantees ZSTD_compress() success. If a smaller value is provided, compression can still proceed, but it may result in failure.
For safety, there are other controls in place, they do not depend on this macro.

It might be useful to distinguish the macro ZSTD_COMPRESSBOUND() from the function ZSTD_compressBound() for this case.
The macro exists for static allocation purposes, like sizing a global array variable. This is a niche use case, useful in a limited number of scenarios. As the point is to produce compile-time sizes, we don't anticipate any issue with potential overflow on 32-bit systems using extreme sizes, this is way out of comfort zone for compile-time static allocation.

The function is a bit more interesting. It's expected to be used in combination with malloc(), so it meets more dynamic scenarios, which could be ground for manipulation. That being said, we don't anticipate much problems regarding the overflow, as it would mean that the input buffer size is very very close to 4 GB on a 32-bit system. It's questionable if it's even possible to allocate such a large buffer to begin with, but assuming it's possible to make it happen, it's obvious that the very limited address space left will be very troublesome for any further operation happening in the same address space, including the requested compression. So failure is the best case scenario, I would actually expect OOM to trigger first, and it's even more probable that no src buffer of a size close to maximum address space can be allocated to begin with.

Still, it's a good point for API completeness to describe what happens in case of overflow.
We could make ZSTD_compressBound() return 0 when it detects an overflow for example, knowing 0 already cannot be a valid return value. It would then be up to the user to decide if it's worth implementing a check for this rare corner case or not.

To answer your last paragraph, there is no ZSTD_decompressBound. Instead we have ZSTD_getFrameContentSize(), or its deprecated variant ZSTD_getDecompressedSize(), they report the exact size of data to decompress (when known), so it's not the result of any mathematical operation that could overflow. It's returned as a unsigned long long value, which is >= 64-bit, and since this value is stored as 64-bit (largest cases), the stored value cannot overflow the return type.

edit : scrap that last paragraph. There is a ZSTD_decompressBound() function in the unstable/experimental section of the API, I just totally forgot about it.
It multiplies the nb of blocks found in a valid frame by the maximum size of a block.
It returns a 64-bit value as a result, which seems to offer some level of protection against overflow by making srcSize requirement impractical, but that's nonetheless worth a second look, just to be sure.

@Cyan4973 Cyan4973 self-assigned this Dec 15, 2022
Cyan4973 added a commit that referenced this issue Dec 15, 2022
fixed #3323, reported by @nigeltao

Completed documentation around this risk
(which is largely theoretical,
I can't see that happening in any "real world" scenario,
but an erroneous @srcSize value could indeed trigger it).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants