-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pasta: port range forwarding: mismatch between data sent and received #17287
Comments
Sounds like socat needs to handle EINTR. Not sure if there is an option for this. |
Here's another one, f37 rootless, and in this one the test times out at 60m. Same thing in the first link above, I just hadn't noticed. @sbrivio-rh PTAL. |
Sorry, I didn't notice about this issue right away, I'm taking a look. Somewhat interestingly, socat always fails (gets interrupted) on the third connection (out of three): the expected data includes a number of What interrupts socat here (and whether socat should just handle |
I can't reproduce this on Fedora 37. I'm looking for a way to make the tests robust to Alternatively we could try to find out what signal I'm wondering, meanwhile, if we should consider a mitigation such as issuing just two connections (instead of three) for any test covering forwarding ranges of (TCP) ports, something like this:
@edsantiago, @Luap99, can this be tried in a CI run? Or should I submit a pull request before I know if it's sufficient or not? |
@sbrivio-rh the only way to try that in CI is to submit a PR. You can mark it DRAFT, WIP, and/or DO NOT MERGE for safety. |
@edsantiago, thanks for clarifying. I submitted a pull request (#17380), CI currently fails with:
I can retrigger that a few times I guess? If you want to speed this up maybe you can also skip this (unrelated?) part of the CI there. |
@sbrivio-rh CI is broken at the moment, you need to wait until we fix the issue then likely rebase. |
A friendly reminder that this issue had no activity for 30 days. |
The issue itself is not happening anymore, because I hid it with #17380, but I'd like to revert that and see if the original issue still occurs. |
|
Incidentally, the flake can still happen with the '2' setting: remote f37 rootless. |
I had a look at this, but so far I haven't managed to reproduce it. I reverted @sbrivio-rh 's workaround patch, but the tests all still pass for me. This is with git podman with #19021 applied, and git pasta with a fix for bug 61 applied. I'm wondering if one of the other fixes we've made has also fixed this as a side effect. Is anyone else able to reproduce this with current versions? |
Usually these flakes only manifest in our CI because it just so much slower than anything run locally which trigger race conditions. |
Not really, but I never managed to reproduce this on my setup either (not even by adding delays in |
Drat, that's going to make this very hard to track down. |
Flake seen today in the wild, not in my PR (that is: with the lower
|
Ok, that's interesting. Sounds like the fact it always failed on the third one before was just a quirk of timing, not anything fundamental. And based on that, I was able to reproduce locally, by increasing the range size to 1000. So, I have something concrete I can debug now. |
Ok, I think I know what's going on. It looks like a bug in Viewing the problem under strace proved surprisingly tricky: I wasn't able to strace the entire bats run, because once I had a large enough number of ports that it reproduced frequently, strace slowed it down enough to just grind to a halt. But with a variety of tricks I eventually managed it. It appears that the The workaround is not to use an I have a draft patch doing that, and I'm running it through some testing now. I have had the test (with 1000 ports) time out once, but I haven't seen it hit the EINTR problem so far. |
…rward tests" This reverts commit c2a24ab, which itself reverted 1c08f2e, which reverted e33f4e0. The original e33f4e0 "pasta: Use two connections instead of three in TCP range forward tests" was a workaround to avoid intermittent errors in CI where the pasta networking port range forwarding tests would fail. It was reverted and unreverted when we thought we'd fixed the problem, but that turned out not to be the case. We're now much more confident that we've genuinely found and fixed (or at least, worked around) the underlying problem, so we revert it again. Link: containers#17287 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Also happens in a different test, same symptom (xxx vs xx), same
socat EINTR
The text was updated successfully, but these errors were encountered: