-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a field to limit the size of uploading content #1701
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the goal to read blob data in smaller chunks and in case we are blocking the API for too long, we raise an error?
pulp_container/pulp_container/app/registry_api.py
Lines 1353 to 1360 in 25abfd6
while True: | |
subchunk = chunk.read(2000000) | |
if not subchunk: | |
break | |
temp_file.write(subchunk) | |
size += len(subchunk) | |
for algorithm in Artifact.DIGEST_FIELDS: | |
hashers[algorithm].update(subchunk) |
I see a threshold defined for config blobs: https://github.com/containers/image/blob/1dbd8fbbe51653e8a304122804431b07a1060d06/internal/image/oci.go#L63-L83. But, I could not find any thresholds for regular blobs. Can we investigate this?
pulp_container/app/registry_api.py
Outdated
@@ -938,6 +942,10 @@ def put(self, request, path, pk=None): | |||
repository.pending_blobs.add(blob) | |||
return BlobResponse(blob, path, 201, request) | |||
|
|||
def _verify_payload_size(self, distribution, chunk): | |||
if distribution.max_payload_size and chunk.reader.length > distribution.max_payload_size: | |||
raise PayloadTooLarge() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this exception properly rendered to the container's API format?
ref:
def handle_exception(self, exc): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the exceptions are properly documented at https://docker-docs.uclv.cu/registry/spec/api/#blob-upload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahn... my bad, I didn't know about the spec for errors.
I also found this doc https://github.com/opencontainers/distribution-spec/blob/v1.0.1/spec.md#error-codes that will be helpful.
pulp_container/app/models.py
Outdated
@@ -751,6 +751,8 @@ class ContainerDistribution(Distribution, AutoAddObjPermsMixin): | |||
null=True, | |||
) | |||
|
|||
max_payload_size = models.IntegerField(null=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we want to make this parameter configurable. I vote for having a constant for this.
Hum... when I was writing this PR I was thinking about a feature to allow a maximum size for blob layers and deny it if the limit is exceeded.
I also couldn't find a limit for regular blobs. From what I could understand, they limit only the size of buffered "resources" (mainifests, config blobs, signatures, etc.) to avoid OOM issues: |
Okay, I think we can follow that path. Let's then focus on manifests, config blobs, and signatures exclusively. |
After an internal discussion, here is the conclusion we got:
|
This change makes sense to me, however besides manifest directive we need to also cap Config blobs are left out but i don't see an easy way to set 4mb limit specifically on them given that they are in One thing I do not understand, if the malicious user can create big manifest that would lead to big memory consumption, what prevents him to just create big blob? There is no limit set on the blobs. |
Skimming through commit description containers/image@61096ab changes, they are mostly targeted on the client side, so the client during pull operation is not susceptible to DDoS attack. |
Hm, apparently the changes were applied also on some upload calls like https://github.com/containers/image/blob/61096ab72530cc9216b50d9fc3bee90b53704f68/docker/docker_image_dest.go#L630 |
2845307
to
60a6fbc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHANGES/532.feature
Outdated
@@ -0,0 +1,2 @@ | |||
Added a limit to the size of manifests and signatures during sync tasks and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you be specific and mention that this is a 4mb limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also explain why this limit was added?
@@ -545,6 +549,14 @@ async def create_signatures(self, man_dc, signature_source): | |||
"Error: {} {}".format(signature_url, exc.status, exc.message) | |||
) | |||
|
|||
if not is_signature_size_valid(signature_download_result.path): | |||
log.info( | |||
"Signature size is not valid, the max allowed size is {}.".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about 'signature body size exceeded maximum allowed size of X'
SIGNATURE_PAYLOAD_MAX_SIZE | ||
) | ||
) | ||
raise ManifestSignatureInvalid(digest=man_digest_reformatted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when used, this exception is raised to the container client
@@ -566,7 +578,12 @@ async def create_signatures(self, man_dc, signature_source): | |||
# signature extensions endpoint does not like any unnecessary headers to be sent | |||
await signatures_downloader.run(extra_data={"headers": {}}) | |||
with open(signatures_downloader.path) as signatures_fd: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have the signatures_downloader.path why not calling 'is_valid_size' before open/reading it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're missing check for manifests
with open(response.path, "rb") as content_file: |
pulp_container/app/utils.py
Outdated
|
||
def _is_manifest_size_valid(manifests): | ||
for manifest in manifests: | ||
if manifest.get("size") > MANIFEST_PAYLOAD_MAX_SIZE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would not trust what's written on the manifest list json file other then actually checking the manifest file size
@@ -38,3 +38,25 @@ location /token/ { | |||
proxy_redirect off; | |||
proxy_pass http://pulp-api; | |||
} | |||
|
|||
location ~* /v2/.*/manifests/.*$ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about apache snippet?
for the sync workflow, sometimes we skip tag or entire manifest list when the signature is missing or one image manifest from the list is unsigned. do we want to also skip those objects that are exceeding the size limit instead of failing whole sync? |
60a6fbc
to
edbaaf1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the code base, but @git-hyagi asked about the Apache config. There we couldn't find a config that applied LimitRequestBody
to proxied requests.
My suggestion is to enforce the checks in the application itself. Having the proxy (nginx, apache or whatever) can be considered an optimization, but Pulp remains responsible for enforcing it.
I have added some inline observations that I think should make it a bit more efficient, but that is all based on familiarity with Django & django-rest-framework that was mostly built up around 10 years ago. I read some relevant information, but it's possible I missed things.
pulp_container/app/downloaders.py
Outdated
|
||
total_size = 0 | ||
buffer = b"" | ||
async for chunk in response.content.iter_chunked(MEGABYTE): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the codebase, but would it make sense to send HTTP 413 Request Entity Too Large before you start reading by inspecting the Content-Length
header (which I already saw was done elsewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hum... it seems like a good idea, but my concern is that it is easy to modify the header and bypass the verification, so I'm not sure if we can trust only in the Content-Length
value, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you really want to be safe, you can apply both. First check the header isn't bigger than the maximum size and then make you read at most the Content-Length
size. Though I think Content-Length
is an optional header so if it's not present, you should keep the current code anyway.
I'd recommend reading aiohttp's documentation on what you can (and can't) assume.
pulp_container/app/registry_api.py
Outdated
@@ -1426,7 +1425,7 @@ def put(self, request, path, pk): | |||
except models.Manifest.DoesNotExist: | |||
raise ManifestNotFound(reference=pk) | |||
|
|||
signature_payload = request.META["wsgi.input"].read(SIGNATURE_PAYLOAD_MAX_SIZE) | |||
signature_payload = request.META["wsgi.input"].read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It surprises me that this uses such a low level implementation. If I read the code right, request
is a rest_framework.request.Request
which has a method request.stream
to get it as io.BytesIO
. Isn't that a much more portable interface? Then I think you can actually use signature_dict = json.load(request.stream)
to save copying the full response into memory an additional time.
And if you want to guard it, you can inspect the length explicitly prior to loading and send HTTP 413. See _load_stream()
for how they read the length:
https://github.com/encode/django-rest-framework/blob/f593f5752c45e06147231bbfd74c02384791e074/rest_framework/request.py#L297-L314
My primary reason to avoid wsgi.input
(and this isn't the only occurrence) is that it blocks you from moving to ASGI and eventually more performant application servers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@git-hyagi, can you also address this comment, please? It is worth the effort.
904a3a4
to
cedb1d1
Compare
pulp_container/app/downloaders.py
Outdated
total_size += len(chunk) | ||
if max_body_size and total_size > max_body_size: | ||
await self.finalize() | ||
raise InvalidRequest(resource_body_size_exceeded_msg(content_type, max_body_size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this result in a HTTP 413 error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder about the impact of this change. Does it mean that pulp-to-pulp syncing (or any syncing in general) will not work when someone already uploaded/synced a larger content unit? I am still a bit hesitant to include this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this result in a HTTP 413 error?
No 😢 it is following the oci distribution spec https://github.com/opencontainers/distribution-spec/blob/v1.0.1/spec.md#error-codes (which does not have a specific error for request entity too large)
I wonder about the impact of this change. Does it mean that pulp-to-pulp syncing (or any syncing in general) will not work when someone already uploaded/synced a larger content unit? I am still a bit hesitant to include this change.
I modified the PR to allow defining these limits in the settings.py (instead of being constants). By default, if these values are not defined, no limit will be enforced, ensuring that the functionality remains the same as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also see the API doc here: https://distribution.github.io/distribution/spec/api/. How is this InvalidRequest
rendered on the client side when using podman?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hum... this is a sample output that I received while trying to push a manifest with an invalid size:
Writing manifest to image destination
WARN[0010] Failed, retrying in 1s ... (1/3). Error: writing manifest: uploading manifest latest to pulp-container:5001/test: unknown: Manifest body size exceeded the maximum allowed size of 10 bytes.
...
WARN[0014] Failed, retrying in 1s ... (2/3). Error: writing manifest: uploading manifest latest to pulp-container:5001/test: unknown: Manifest body size exceeded the maximum allowed size of 10 bytes.
...
WARN[0019] Failed, retrying in 1s ... (3/3). Error: writing manifest: uploading manifest latest to pulp-container:5001/test: unknown: Manifest body size exceeded the maximum allowed size of 10 bytes.
...
Error: writing manifest: uploading manifest latest to pulp-container:5001/test: unknown: Manifest body size exceeded the maximum allowed size of 10 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what dockerhub returns in case I am trying to upload a 4MB+ large file.
tr -dc 'a-zA-Z' < /dev/urandom | head -c 5194304 > output.txt
cat output.txt | http --auth-type=jwt --auth=$TOKEN_AUTH PUT https://registry-1.docker.io/v2/lmjachky/pulp/manifests/new
HTTP/1.1 400 Bad Request
content-length: 110
content-type: application/json
date: Thu, 19 Sep 2024 09:25:15 GMT
docker-distribution-api-version: registry/2.0
docker-ratelimit-source: 2a02:8308:b093:8e00:62b:28c5:e440:e0b9
strict-transport-security: max-age=31536000
{
"errors": [
{
"code": "MANIFEST_INVALID",
"detail": "http: request body too large",
"message": "manifest invalid"
}
]
}
We should be returning a formatted error response. Similarly to our existing MANIFEST_INVALID
:
detail={ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No 😢 it is following the oci distribution spec https://github.com/opencontainers/distribution-spec/blob/v1.0.1/spec.md#error-codes (which does not have a specific error for request entity too large)
It says 4xx code:
A 4XX response code from the registry MAY return a body in any format.
So I don't think it forbids using code 413. Did you intend to link to https://github.com/opencontainers/distribution-spec/blob/v1.0.1/spec.md#determining-support which does list error codes in the Failure
column (and supports what you said)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It says 4xx code:
A 4XX response code from the registry MAY return a body in any format.
So I don't think it forbids using code 413.
hum.... this is a good point.
It seems like this doc is a little bit confusing. I thought that 413 was not included in the spec because the "Error Codes" section states:
The code field MUST be one of the following:
and there isn't an entity too large
error in the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps worth raising an issue on the spec to get an opinion?
cedb1d1
to
452a2fa
Compare
19732ef
to
884f7df
Compare
Creating a manifest larger than 4MB is quite challenging. Currently, registries have this 4MB limit adopted for uploading. However, it is possible that we may encounter hitting the limit in the future frequently. Given this, should we consider removing the restriction if there is a chance it might be lifted later on? Alternatively, would it be better to maintain the current limit and allow administrators to fiddle with this constraint? Are the administrators ever going to touch the proposed setting? So many questions... I am sorry but I am leaning towards closing the PR and issue completely. Do you mind bringing in more voices? |
I agree that we still have a lot of questions and, with the OCI spec supporting a wide range of different "artifacts", it is possible that we may encounter some problems to fine tune a good value for these constraints and adding this setting (allowing admins to change the limit value) could only bring more code to be maintained for something that no one ever would need/change. |
CHANGES/532.feature
Outdated
Added support to the `MANIFEST_PAYLOAD_MAX_SIZE` and `SIGNATURE_PAYLOAD_MAX_SIZE` settings to define | ||
limits (for the size of Manifests and Signatures) to protect against OOM DoS attacks during synchronization tasks | ||
and image uploads. | ||
Additionally, the Nginx snippet has been updated to enforce the limit for these endpoints. | ||
Modified the internal logic of Blob uploads to read the receiving layers in chunks, | ||
thereby reducing the memory footprint of the process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should refine this changelog to make it shorter, simpler, and more user-friendly. For instance, we can omit details like "Modified the internal logic of Blob uploads to read the receiving layers in chunks, thereby reducing the memory footprint of the process".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the discussion we had, you should now update the code to start returning 413 in case the processed entity is too large (https://github.com/pulp/pulp_container/pull/1701/files#r1766498485).
More observations:
- The settings should be defined in the
settings.py
file. We should not runsettings.get("MANIFEST_PAYLOAD_MAX_SIZE", None)
in "heavy-duty" situations. - Once again, please, do not forget to use
request.stream
in other places as well. - It is up to you what you decide to do with web-server snippets. I think we should drop any changes made to the
nginx.conf
file and keep the apache config untouched.
+1 to revert changes made to the snippets. |
884f7df
to
a24678e
Compare
log.warning(e.args[0]) | ||
return None, None, None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are addressing another feature request with this change: #648. Should we treat a corrupted manifest with the same level of importance as a manifest larger than 4MB+?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though I suggested it, I'm cautious about including the request.META["wsgi.input"]
removal in the same commit. I don't know how strict you are, but I always prefer to write such a refactor before implementing a new feature.
pulp_container/app/registry_api.py
Outdated
@@ -867,7 +870,7 @@ def partial_update(self, request, path, pk=None): | |||
""" | |||
_, repository = self.get_dr_push(request, path) | |||
upload = get_object_or_404(models.Upload, repository=repository, pk=pk) | |||
chunk = request.META["wsgi.input"] | |||
chunk = request.stream or BytesIO() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If request.stream
is unavailable, shouldn't that be a fatal error? Otherwise you'll just read an empty "file", right?
I see this is only used in some conditional path so should chunk = request.stream
only be defined in the if range_header
section below with a fatal check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If request.stream is unavailable, shouldn't that be a fatal error? Otherwise you'll just read an empty "file", right?
From the oci spec, manifests with empty layers (0 bytes blobs) are considered valid (even though not supported by all registries):
Implementers note: The following is considered GUIDANCE for portability.
Parts of the spec necessitate including a descriptor to a blob where some implementations of artifacts do not have associated content. While an empty blob (size of 0) may be preferable, practice has shown that not to be ubiquitously supported.
Considering this change (usage of stream
instead of the wsgi.input
) is bringing more concerns, we discussed it and decided to handle it in a separate PR.
Adds new settings to limit the size of manifests and signatures as a safeguard to avoid DDoS attack during sync and upload operations. Modify the blob upload to read the layers in chunks. closes: pulp#532
a24678e
to
743533a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are we going to do with content that was already synced into Pulp and is larger than 4MB? Are we prepared to answer questions like "why I cannot see my content after pulp-to-pulp syncing"?
I am about 80% inclined to close this PR. I think we need another, final round, of discussing this with other people.
Added a limit of 4MB to non-Blob content, through the `OCI_PAYLOAD_MAX_SIZE` setting, to protect | ||
against OOM DoS attacks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a limit of 4MB to non-Blob content, through the `OCI_PAYLOAD_MAX_SIZE` setting, to protect | |
against OOM DoS attacks. | |
Introduced a size limit for non-Blob content (i.e., OCI manifests/signatures) to prevent OOM DoS attacks during syncing and pushing. | |
The limit is configurable via the `MAX_MANIFEST_BODY_SIZE` setting, with a default of 4MB. |
Let's try to be more explicit.
# This limit is also valid for docker manifests, but we will use the OCI_ prefix | ||
# (instead of ARTIFACT_) to avoid confusion with pulpcore artifacts. | ||
OCI_PAYLOAD_MAX_SIZE = 4_000_000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# This limit is also valid for docker manifests, but we will use the OCI_ prefix | |
# (instead of ARTIFACT_) to avoid confusion with pulpcore artifacts. | |
OCI_PAYLOAD_MAX_SIZE = 4_000_000 | |
# This limit is also valid for docker manifests | |
MAX_MANIFEST_BODY_SIZE = 4_000_000 |
Is it 4MB?
>>> 1 << 20
1048576
>>> 1024 * 1024
1048576
>>> 4 * 1 << 20
4194304
>>> 4 * 1024 * 1024
4194304
ref: https://github.com/distribution/distribution/blob/d0eebf3af4fc1d5c0287e5af61147403ccb78ec2/registry/handlers/manifests.go#L29, https://github.com/containers/image/blob/8d792a4a930c36ae3228061531cca0958ba4fe0a/internal/iolimits/iolimits.go#L20
log.warning( | ||
"Failed to sync signature {}. Error: {}".format(signature_url, e.args[0]) | ||
) | ||
signature_counter += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signature_counter += 1 |
# content_type is defined by aiohttp based on the definition of the content-type header. | ||
# When it is not set, aiohttp defines it as "application/octet-stream" | ||
# note: http content-type header can be manipulated, making it easy to bypass this | ||
# size restriction, but checking the manifest content is also not a feasible solution | ||
# because we would need to first download it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😭 So, we actually need to depend on this optional header. Meaning that we will never by able to assure that no manifest larger than 4MB will be synced into Pulp... Why then trying to support it? I am still not sold on this restriction.
I have not skimmed through the code much, but it appears like quay "reads" an already downloaded manifest with the 4MB limitation. It is not like they restrict this during the syncing/downloading/uploading. Or, am I wrong? See https://github.com/containers/image/blob/cba49408c0ea237a6aa6dba5e81b74f4a8f23480/docker/docker_client.go#L966 and more other occurrences of ReadAtMost
where the limits from https://github.com/containers/image/blob/8d792a4a930c36ae3228061531cca0958ba4fe0a/internal/iolimits/iolimits.go#L13 are used.
Also, it is somewhat unfortunate that we are relying on the downloader to determine the type of the data, as it's sole responsibility should be downloading data without being aware of the types. Shifting this check to the sync pipeline seems to be a bit better solution afterall. However, this would mean that we will download, .e.g., 1GB of data and then throw it away. Perhaps, it is fine. I am sorry for dragging you away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resurrecting this comment: #1701 (comment)
response = await downloader.run(extra_data={"headers": V2_ACCEPT_HEADERS}) | ||
except PayloadTooLarge as e: | ||
log.warning(e.message + ": max size limit exceeded!") | ||
raise RuntimeError("Manifest max size limit exceeded.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this result in a failed task or a skipped manifest list? I thought that we agreed on skipping the manifest list, did not we?
closes: #532