-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSLEOFError when scanning a windows docker image #555
Comments
Hi, how often does this happen? Does it always happen with the same Docker image? |
Hi @agateau-gg 👋🏼 I tried this twice in a row - the second not too long after the first one failed, to see if it was transient or not. |
I've just tried again this morning, and have the same issue @agateau-gg |
Sorry for the late reply. Is your CI running behind an HTTP proxy? If so, what you are experiencing could be urllib3 issue #2164, which is in turn caused by Python issue #86793. This bug has been fixed in Python 3.9.13, 3.10.5 and in 3.11.0. |
Hi @agateau-gg We are running Python 3.11.4; at the time I raised the issue, we would have been running 3.11.3 Python reports no proxies are in use either: |
Having more info could help us debug that. Can you try to add a call to You can also try adding |
Here's the output of the debug command, and the
|
The SSL errors may be caused by multiple network requests stepping on each other. There is an undocumented environment variable to limit the number of network requests: Can you try the following?
You can verify the environment variable is taken into account by running the command with
|
Thanks - I have set the env var via:
which should work as that's the same syntax we are using elsewhere in our powershell script. I tried again with the branch build but still have the same problem. I can't see the
|
I think there is an issue with the way the ggshield branch has been installed: as long as Based on the log output you posted, the path to the ggshield command is |
Ah, that is my bad @agateau-gg - I didn't have the
|
OK, I pushed another branch: agateau/debug-555 with more changes:
I'd like to investigate if the problem is:
Can you start with running the scan from the agateau/debug-555 branch with GG_MAX_WORKERS to 1? If it passes then A is probably the issue. If it still does not pass, can you run the scan with GG_MAX_WORKERS and GG_MAX_DOCS both set to 1, at least 2 or 3 times? If the scan always fail on the same file, then the problem is B and we need to investigate what's wrong with that particular file. Warning: scanning with GG_MAX_WORKERS=1 and GG_MAX_DOCS=1 is going to be slow... |
Hi @agateau-gg 👋🏼 The scan failed with the SSLEOFError with The scan failed with different errors with Here's the last snippets of the bottom of each log (I tried 3 times) for the latter scenario; you'll notice effectively the same output, aside from the % completion:
On a side note, it would be really great if we could skip the first layer or so of known images. I saw some changes recently re. caching scan results, but that likely won't help us with our ephemeral build agents (unless you guys cache the results for each layer hash on your side). |
That's interesting: the "maximum allowed size" bug is #561, which we plan to work on soon. I wonder if the SSL error could be caused by that 🤔. I am going to make some changes to the debug branch to artificially restrict this size so that you don't hit that bug anymore.
Skipping the first layer was our initial plan, but it felt less efficient if other layers add a lot of tools to the image as these layers would still be scanned every time. CI systems often have the ability to make a given directory persist between runs, even with ephemeral build agents. Have you looked into this? |
Just pushed a new commit to the |
Sorry @agateau-gg , I still get the same error. Additionally, I can't find the "to avoid bug" output in the build log
|
I found another issue which would cause the "Scanning failed: file exceeds the maximum allowed size of 10485800B". I pushed another commit to agateau/debug-555 to workaround it. Can you give it a try? |
Same again, sorry @agateau-gg
|
I'd like to eliminate the possibility of an SSL connection issue between your server and GitGuardian. I attached a small Python script call netcheck.py to this comment. It works by repeatedly sending a GET request to our API server using a variety of methods. Can you install it in the same virtualenv as ggshield, then run it with |
Here's what that returned @agateau-gg
|
Thanks, the urllib error can be ignored because ggshield only uses requests. I created a git repository for netcheck and made some improvements to it. Can you get it from https://github.com/agateau-gg/netcheck and re-run the test with |
Here you go @agateau-gg
|
OK thanks, so netcheck found nothing 😞. At least this culprit has been eliminated. Another possible culprit might be the document size. From what I see in your log output, your server is configured with a maximum doc size of 10,485,800. That's quite large. I just rebased the agateau/debug-555 branch on top of main and added yet another environment variable: Can you try running a scan with I am going to do a similar test on my side. |
Is there a new dep, or has one of the python deps removed their dependency on cryptography?
|
This is odd: the dependency on cryptography has been added as optional in 1.17.2. It is still optional as of now. In any case, the dependency is going to be mandatory in 1.18.0, so you might as well install it. |
🎉 🎉 🎉 🎉 🎉
FWIW I think our max doc size was set by the support team as the scanner was causing windows docker image builds to fail with a non-zero exit code. |
Interesting! Can you try a few different values for the maximum size so that we have a rough idea of the point it starts to cause problems? |
Here you go @agateau-gg
|
Thanks, looks like it break at quite a low size :(. Going to try to reproduce on our side and see if we can fix this server side. I still don't understand why it causes SSL failures though. |
I can reproduce the bug on our side, we are going to investigate. We'll keep you posted. |
I spent more time investigating, and I now know more about what's happening. Our SaaS instance has a payload limit of ~30M. Since ggshield sends documents to scan in batch of 20 and your maximum document size is set to 10,485,800 bytes, it's quite easy to create a payload larger than 30M. When this happens, the server abruptly stops the incoming HTTP request, causing the SSL layer to report an "unexpected EOF". We are going to work on a proper solution to support scans of larger documents without hitting the server payload limit. In the mean time, I created the
Let me know if this helps. |
Thanks @agateau-gg - I think we'll await the official fix. |
Hi @agateau-gg 👋🏼 Can you please confirm whether this issue was fixed by #823 ? We are still seeing this problem crop up, but are pinned to |
Oh, it should have been fixed in 1.24.0 😞. I am trying to reproduce on my side. In the mean time, can you add the |
Hi @agateau-gg, I'll be taking the reins on this as @testworksau is heading off on some well-deserved leave. As requested, here's the failure when running with the
|
Hi @santhonisz, I just pushed a branch which does two things:
You can install the branch with If it fails, can you set the environment variable |
No luck I'm afraid. Here's the (filtered) debug log output using your branch version:
|
Damn, I thought we would have gotten more useful information from that run 😞. Going to prepare another branch tomorrow. |
Hi, can you try again? I added more logs and 5 retries in case of failures. No need to set |
Still failing, here's the latest output:
|
I am running out of ideas here :/. Going to check with my colleagues how to debug that. Sorry for the inconvenience. |
Sorry for the lack of response, we are still investigating on our side. In the mean time, do you have a way to run the same scan from a Linux machine? I am wondering if the OS could have an impact on that bug. |
Environment
Describe the bug
We are seeing this issue on one of our builds:
Steps to reproduce:
Actual result:
ERROR: Error scanning: HTTPSConnectionPool(host='api.gitguardian.com', port=443): Max retries exceeded with url: /v1/multiscan?ignore_known_secrets=True (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2423)')))
Expected result:
The build should work successfully.
The text was updated successfully, but these errors were encountered: