-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
packetdrill: add_addr
test is regularly failing: packets arriving before the expected time
#312
Comments
Less regularly, the "server" test fails as well
This one is more surprising because the ADD_ADDR is supposed to be sent just after the accept and here it is sent a very long time after! (debug kernel config) |
It looks like the root cause is the CI being very slow, so that: <event starting the timeout, at time t=0> No idea how to fix it... |
a possible fix would be replacing
with:
and adjust the latter with the current tolerance with sed, before starting the script (With the assumption that the timeout must be greater then the tolerance) |
On very slow hosts, injected packets cause troubles because it reset timers in Packetdrill, see: multipath-tcp/mptcp_net-next#312 Add a new "safer" version without these injected packets to monitor if everything is OK like that. Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
ADD_ADDR restransmissions are expected to come ~1s after the last sent packet. In packetdrill, the timing is compared to the last packet. We would like to tell Packetdrill that the packet is expected to come "1s not after the previous packet but the one before" but that's not possible. In this test the last packet before the ADD_ADDR retransmission is not the ADD_ADDR that is sent before but it is the injected and falty echo ADD_ADR. With a (very) slow host, it can takes a bit of time to inject the packet. Because of that, the retransmitted packet can arrive earlier than expected: if the falty echo ADD_ADDR injection takes X sec, the retransmission of the ADD_ADDR by the kernel will take (1 - X) sec. If X is bigger than the tolerance, the test fails which seems to happen regularly on the public CI. add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 3.214190 sec but happened at 2.384757 sec; tolerance 0.800000 sec script packet: 3.214190 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296> actual packet: 2.384757 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296> We should then accept packets being sent less than one second before the injected and falty echo ADD_ADDR. Even worst with a debug kernel config: add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 10.335930 sec but happened at 5.908318 sec; tolerance 2.000000 sec script packet: 10.335930 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620> actual packet: 5.908318 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620> Injecting the falty echo and processing packets sent by the kernel can take time and we have situations where packets arrive a few seconds before the expected time by Packetdrill! Sadly, we cannot tell Packetdrill the packet is expected to be sent in the past. So we need to increment the tolerance a bit. But that's find to do that because a new test has been added in the parent commit: it is focussing on the ADD_ADDR retransmissions without injecting other packets in between. This other test can have stricter expected time. Closes: multipath-tcp/mptcp_net-next#312 Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
On very slow hosts, injected packets cause troubles because it reset timers in Packetdrill, see: multipath-tcp/mptcp_net-next#312 Add a new "safer" version without these injected packets to monitor if everything is OK like that. Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
ADD_ADDR restransmissions are expected to come ~1s after the last sent packet. In packetdrill, the timing is compared to the last packet. We would like to tell Packetdrill that the packet is expected to come "1s not after the previous packet but the one before" but that's not possible. In this test the last packet before the ADD_ADDR retransmission is not the ADD_ADDR that is sent before but it is the injected and faulty echo ADD_ADR. With a (very) slow host, it can takes a bit of time to inject the packet. Because of that, the retransmitted packet can arrive earlier than expected: if the faulty echo ADD_ADDR injection takes X sec, the retransmission of the ADD_ADDR by the kernel will take (1 - X) sec. If X is bigger than the tolerance, the test fails which seems to happen regularly on the public CI. add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 3.214190 sec but happened at 2.384757 sec; tolerance 0.800000 sec script packet: 3.214190 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296> actual packet: 2.384757 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296> We should then accept packets being sent less than one second before the injected and faulty echo ADD_ADDR. Even worst with a debug kernel config: add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 10.335930 sec but happened at 5.908318 sec; tolerance 2.000000 sec script packet: 10.335930 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620> actual packet: 5.908318 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620> Injecting the faulty echo and processing packets sent by the kernel can take time and we have situations where packets arrive a few seconds before the expected time by Packetdrill! Sadly, we cannot tell Packetdrill the packet is expected to be sent in the past. So we need to increment the tolerance a bit. But that's find to do that because a new test has been added in the parent commit: it is focussing on the ADD_ADDR retransmissions without injecting other packets in between. This other test can have stricter expected time. Closes: multipath-tcp/mptcp_net-next#312 Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
This PR should help to reduce the false positives seen by the CI: multipath-tcp/packetdrill#98 |
add_addr
test is regularly failing: packets arriving after the expected timeadd_addr
test is regularly failing: packets arriving before the expected time
On very slow hosts, injected packets cause troubles because it reset timers in Packetdrill, see: multipath-tcp/mptcp_net-next#312 Add a new "safer" version without these injected packets to monitor if everything is OK like that. Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> (cherry picked from commit 59cd5dd) Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
ADD_ADDR restransmissions are expected to come ~1s after the last sent packet. In packetdrill, the timing is compared to the last packet. We would like to tell Packetdrill that the packet is expected to come "1s not after the previous packet but the one before" but that's not possible. In this test the last packet before the ADD_ADDR retransmission is not the ADD_ADDR that is sent before but it is the injected and faulty echo ADD_ADR. With a (very) slow host, it can takes a bit of time to inject the packet. Because of that, the retransmitted packet can arrive earlier than expected: if the faulty echo ADD_ADDR injection takes X sec, the retransmission of the ADD_ADDR by the kernel will take (1 - X) sec. If X is bigger than the tolerance, the test fails which seems to happen regularly on the public CI. add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 3.214190 sec but happened at 2.384757 sec; tolerance 0.800000 sec script packet: 3.214190 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296> actual packet: 2.384757 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296> We should then accept packets being sent less than one second before the injected and faulty echo ADD_ADDR. Even worst with a debug kernel config: add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 10.335930 sec but happened at 5.908318 sec; tolerance 2.000000 sec script packet: 10.335930 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620> actual packet: 5.908318 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620> Injecting the faulty echo and processing packets sent by the kernel can take time and we have situations where packets arrive a few seconds before the expected time by Packetdrill! Sadly, we cannot tell Packetdrill the packet is expected to be sent in the past. So we need to increment the tolerance a bit. But that's find to do that because a new test has been added in the parent commit: it is focussing on the ADD_ADDR retransmissions without injecting other packets in between. This other test can have stricter expected time. Closes: multipath-tcp/mptcp_net-next#312 Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> (cherry picked from commit 7e14dec) Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
In case when is64 == 1 in emit(A64_REV32(is64, dst, dst), ctx) the generated insn reverses byte order for both high and low 32-bit words, resuling in an incorrect swap as indicated by the jit test: [ 9757.262607] test_bpf: #312 BSWAP 16: 0x0123456789abcdef -> 0xefcd jited:1 8 PASS [ 9757.264435] test_bpf: #313 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 jited:1 ret 1460850314 != -271733879 (0x5712ce8a != 0xefcdab89)FAIL (1 times) [ 9757.266260] test_bpf: #314 BSWAP 64: 0x0123456789abcdef -> 0x67452301 jited:1 8 PASS [ 9757.268000] test_bpf: #315 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 jited:1 8 PASS [ 9757.269686] test_bpf: #316 BSWAP 16: 0xfedcba9876543210 -> 0x1032 jited:1 8 PASS [ 9757.271380] test_bpf: #317 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 jited:1 ret -1460850316 != 271733878 (0xa8ed3174 != 0x10325476)FAIL (1 times) [ 9757.273022] test_bpf: #318 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe jited:1 7 PASS [ 9757.274721] test_bpf: #319 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 jited:1 9 PASS Fix this by forcing 32bit variant of rev32. Fixes: 1104247 ("bpf, arm64: Support unconditional bswap") Signed-off-by: Artem Savkov <asavkov@redhat.com> Tested-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Xu Kuohai <xukuohai@huawei.com> Message-ID: <20240321081809.158803-1-asavkov@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
It is usually failing when executing the
retry
job, in v4, v6mapped and v6):Note that on the public CI, the tests are running with a higher tolerance than what is written in the scripts:
Reproduced quite regularly these last days on the public CI:
The text was updated successfully, but these errors were encountered: