Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PULP-208] Disable download retry with on demand streaming #6087

Conversation

pedro-psb
Copy link
Member

#5937

Testing

I dont know how to write an automated test for that, but I've asserted it through logs by running the test_remote_content_changed_with_on_demand, which causes a digest validation error in the on-demand streaming context.

Before:

[pulp]  | (OMITED) "GET /pulp/api/v3/distributions/file/file/01936f14-21af-7102-b69b-65a30d34596e/ HTTP/1.0" 200 631 "-" "api/using_plugin/test_content_delivery.py::test_remote_content_changed_with_on_demand"',)

[pulp]  | Backing off download_wrapper(...) for 0.8s (pulpcore.exceptions.validation.DigestValidationError: A file located at the url https://127.0.0.1:44247/basic/2.iso failed validation due to checksum. Expected '89a0fbc35e07fe70fb46467326b6ab6a5bc53da48d518f3a6c6ef17dc3fab256', Actual '22f92924b8fef320f9933b196339a5dc5f1f662a27b09a3da9f2d73ce4678aa1')
[pulp]  | pulp [None]: backoff:INFO: Backing off download_wrapper(...) for 0.8s (pulpcore.exceptions.validation.DigestValidationError: A file located at the url https://127.0.0.1:44247/basic/2.iso failed validation due to checksum. Expected '89a0fbc35e07fe70fb46467326b6ab6a5bc53da48d518f3a6c6ef17dc3fab256', Actual '22f92924b8fef320f9933b196339a5dc5f1f662a27b09a3da9f2d73ce4678aa1')
[pulp]  | Backing off download_wrapper(...) for 0.2s (pulpcore.exceptions.validation.DigestValidationError: A file located at the url https://127.0.0.1:44247/basic/2.iso failed validation due to checksum. Expected '89a0fbc35e07fe70fb46467326b6ab6a5bc53da48d518f3a6c6ef17dc3fab256', Actual '22f92924b8fef320f9933b196339a5dc5f1f662a27b09a3da9f2d73ce4678aa1')
[pulp]  | pulp [None]: backoff:INFO: Backing off download_wrapper(...) for 0.2s (pulpcore.exceptions.validation.DigestValidationError: A file located at the url https://127.0.0.1:44247/basic/2.iso failed validation due to checksum. Expected '89a0fbc35e07fe70fb46467326b6ab6a5bc53da48d518f3a6c6ef17dc3fab256', Actual '22f92924b8fef320f9933b196339a5dc5f1f662a27b09a3da9f2d73ce4678aa1')
[pulp]  | Backing off download_wrapper(...) for 1.6s (pulpcore.exceptions.validation.DigestValidationError: A file located at the url https://127.0.0.1:44247/basic/2.iso failed validation due to checksum. Expected '89a0fbc35e07fe70fb46467326b6ab6a5bc53da48d518f3a6c6ef17dc3fab256', Actual '22f92924b8fef320f9933b196339a5dc5f1f662a27b09a3da9f2d73ce4678aa1')
[pulp]  | pulp [None]: backoff:INFO: Backing off download_wrapper(...) for 1.6s (pulpcore.exceptions.validation.DigestValidationError: A file located at the url https://127.0.0.1:44247/basic/2.iso failed validation due to checksum. Expected '89a0fbc35e07fe70fb46467326b6ab6a5bc53da48d518f3a6c6ef17dc3fab256', Actual '22f92924b8fef320f9933b196339a5dc5f1f662a27b09a3da9f2d73ce4678aa1')
[pulp]  | Giving up download_wrapper(...) after 4 tries (pulpcore.exceptions.validation.DigestValidationError: A file located at the url https://127.0.0.1:44247/basic/2.iso failed validation due to checksum. Expected '89a0fbc35e07fe70fb46467326b6ab6a5bc53da48d518f3a6c6ef17dc3fab256', Actual '22f92924b8fef320f9933b196339a5dc5f1f662a27b09a3da9f2d73ce4678aa1')
[pulp]  | pulp [None]: backoff:ERROR: Giving up download_wrapper(...) after 4 tries (pulpcore.exceptions.validation.DigestValidationError: A file located at the url https://127.0.0.1:44247/basic/2.iso failed validation due to checksum. Expected '89a0fbc35e07fe70fb46467326b6ab6a5bc53da48d518f3a6c6ef17dc3fab256', Actual '22f92924b8fef320f9933b196339a5dc5f1f662a27b09a3da9f2d73ce4678aa1')

[pulp]  | [2024-11-27 19:23:34 +0000] [44606] [ERROR] Error handling request

After:

[pulp]  | (OMITED) "GET /pulp/api/v3/distributions/file/file/01936f15-b88b-770a-b1fe-16e47a99c3bf/ HTTP/1.0" 200 631 "-" "api/using_plugin/test_content_delivery.py::test_remote_content_changed_with_on_demand"',)

[pulp]  | [2024-11-27 19:25:15 +0000] [44813] [ERROR] Error handling request

@pedro-psb pedro-psb changed the title Disable download retry with on demand streaming [PULP-208] Disable download retry with on demand streaming Nov 27, 2024
@pedro-psb pedro-psb force-pushed the disable-download-retry-with-on-demand-streaming branch from e14d8e5 to ebcfb34 Compare November 27, 2024 19:35
@pedro-psb pedro-psb force-pushed the disable-download-retry-with-on-demand-streaming branch from ebcfb34 to 76c5484 Compare November 27, 2024 20:45
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using extra_data instead of creating a new keyword arg retry=True because adding a new kwarg breaks some parts of the test. So using extra_data is the less friction path.

Also, it looks like that's the purpose of extra_data anyway.


async with self.semaphore:

@backoff_decorator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't there exceptions we would still like to retry with, like connection errors before we even started sending headers back to the client?
Also as a matter of style, I'd rather conditionally apply the decorator (func = decorator(func)) instead of creating a noop-decorator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't there exceptions we would still like to retry with, like connection errors before we even started sending headers back to the client?

Yes, you are right.
So I guess the approach here could be passing what exceptions we want to retry (or exclude) from the default list. IHO exclude is more expressive for the probem.
Wdyt?

Our RemoteArtifact streaming uses chunked transfer, so once it has
started a response, we can't fix anything anymore in case an error.

Because of that, the retry logic (any of the timeout, digest, etc) on
the http downloader is just delaying a response that can't be possible
right anymore.

Closes pulp#5937
@pedro-psb pedro-psb force-pushed the disable-download-retry-with-on-demand-streaming branch from ddd4e20 to caad06c Compare November 29, 2024 12:42
@pedro-psb pedro-psb marked this pull request as ready for review November 29, 2024 13:32
@@ -240,6 +242,9 @@ async def run(self, extra_data=None):
SizeValidationError,
)

retryable_errors = tuple(
[e for e in default_retryable_errors if e not in disable_retry_list]
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to be a tuple or a single exception.
Passing anything else raises a not-helpful error.

@@ -1138,7 +1138,9 @@ async def finalize():
original_finalize = downloader.finalize
downloader.finalize = finalize
try:
download_result = await downloader.run()
download_result = await downloader.run(
extra_data={"disable_retry_list": (DigestValidationError,)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about "code in the domain of the problem", I'm wondering if we should just say "This downloader is streaming" here, and let the downloader.run decide decide how that impacts retrying.
At least we would have all the determination of what is retryable and what is not in one place.
run is an external interface, right? So we kind of need to do it right now...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like the downloader is supposed to be a more generic object.
For example, the downloader doesnt know about the streaming we are doing here so we need to explicit patch and add conditions to handle it as this context requires. If we want to shift approaches (make the downloader context-aware), we should move these to the downloader aswell and make it a bit more complex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point.
Actually we monkeypatch the downloader here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the downloader code is a mess already so this doesn't make it particularly worse. It would be nice to specialize the downloader object in such a way that we don't need any of this directly at the call site. But that's a more involved refactor, so I'd consider it out of scope for a bugfix. Maybe something for the future.

@mdellweg mdellweg merged commit 5462a28 into pulp:main Dec 3, 2024
12 checks passed
@pedro-psb pedro-psb deleted the disable-download-retry-with-on-demand-streaming branch December 3, 2024 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants