Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LocalStack Tests Flaky in CI #5283

Closed
tustvold opened this issue Jan 5, 2024 · 6 comments · Fixed by #5570
Closed

LocalStack Tests Flaky in CI #5283

tustvold opened this issue Jan 5, 2024 · 6 comments · Fixed by #5570
Labels
bug development-process Related to development process of arrow-rs

Comments

@tustvold
Copy link
Contributor

tustvold commented Jan 5, 2024

Describe the bug

My suspicion is the container is running out of memory or otherwise crashing, but I have not been able to reproduce this locally

To Reproduce

Expected behavior

Additional context

@tustvold tustvold added bug development-process Related to development process of arrow-rs labels Jan 5, 2024
@tustvold
Copy link
Contributor Author

tustvold commented Jan 5, 2024

Error in https://github.com/apache/arrow-rs/actions/runs/7422271983/job/20197235081?pr=5285

Traceback (most recent call last):
  File "/opt/code/localstack/localstack/http/asgi.py", line 548, in handle_http
    await response.write(packet)
  File "/opt/code/localstack/localstack/http/asgi.py", line 302, in write
    await self.send({"type": "http.response.body", "body": data, "more_body": True})
  File "/opt/code/localstack/.venv/lib/python3.11/site-packages/hypercorn/protocol/http_stream.py", line 179, in app_send
    await self.send(
  File "/opt/code/localstack/.venv/lib/python3.11/site-packages/hypercorn/protocol/h11.py", line 136, in stream_send
    await self._send_h11_event(h11.Data(data=event.data))
  File "/opt/code/localstack/.venv/lib/python3.11/site-packages/hypercorn/protocol/h11.py", line 240, in _send_h11_event
    data = self.connection.send(event)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/code/localstack/.venv/lib/python3.11/site-packages/h11/_connection.py", line 512, in send
    data_list = self.send_with_data_passthrough(event)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/code/localstack/.venv/lib/python3.11/site-packages/h11/_connection.py", line 545, in send_with_data_passthrough
    writer(event, data_list.append)
  File "/opt/code/localstack/.venv/lib/python3.11/site-packages/h11/_writers.py", line 65, in __call__
    self.send_data(event.data, write)
  File "/opt/code/localstack/.venv/lib/python3.11/site-packages/h11/_writers.py", line 91, in send_data
    raise LocalProtocolError("Too much data for declared Content-Length")
h11._util.LocalProtocolError: Too much data for declared Content-Length

@tustvold
Copy link
Contributor Author

tustvold commented Jan 5, 2024

Filed an upstream issue as I'm somewhat at a loss for how to debug this further - localstack/localstack#10003

tustvold added a commit to tustvold/arrow-rs that referenced this issue Mar 3, 2024
tustvold added a commit that referenced this issue Mar 4, 2024
@bentsku
Copy link

bentsku commented Mar 4, 2024

Hi @tustvold, I've seen the update in the GH issue, I'm not sure updating will help, I don't remember doing a lot of changes here. I will have a look again and try to reproduce locally as much as I can, and try to merge a possible race condition fix soon. Will keep you updated!

@bentsku
Copy link

bentsku commented Mar 4, 2024

Okay I finally could reproduce and find which requests were triggering the issue:
I started by creating a pretty resource-constrained environment (Docker image with 0.5 CPUs)

It's a GetObject with Content-Length set to 1, with the key prefix RACE-
Followed by a PutObject of length 2.
The first call get the new data with a still indicated content-length of 1, but gets the new 10 data so it fails.
The reproducer is not stable at all, I just run cargo test aws::tests::s3_test --features aws -- --exact again and again and it can fail around 2 to 10% of the time with 0.5CPU.

I'll work on a fix now, might have an idea where this could come from. Thanks again for reporting this, I'm sure we will finally get to the end of this! 🤞

@bentsku
Copy link

bentsku commented Mar 5, 2024

Hello! The issue should now be fixed, I ran 200 times aws::tests::s3_test and finally could not reproduce it anymore with the linked PR. It will, however, only be available in the latest image (should be available in around 30 to 45 minutes), not sure if you'd be willing to use this tag. I believe 3.3 will be released end of March.

Thank you again for your patience in this and your helps in uncovering the issue!

@tustvold
Copy link
Contributor Author

tustvold commented Mar 11, 2024

Fantastic news, I'll probably wait until 3.3 as the regular CI failures appear to have abated for now, but will pre-empt this and use a image sha should they return. Thank you for all your work on this 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug development-process Related to development process of arrow-rs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants