packetdrill: `add_addr` test is regularly failing: packets arriving before the expected time #312

matttbe · 2022-10-11T12:34:40Z

It is usually failing when executing the retry job, in v4, v6mapped and v6):

# FAIL [/opt/packetdrill/gtests/net/mptcp/add_addr/add_addr_retry_v4.pkt (ipv4-mapped-v6)]
# stdout: 
# stderr: 
# add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 3.214190 sec but happened at 2.384757 sec; tolerance 0.800000 sec
# script packet:  3.214190 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296>
# actual packet:  2.384757 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296>

Note that on the public CI, the tests are running with a higher tolerance than what is written in the scripts:

times 4 with a non debug kernel config (0.8s here with this test)
times 10 with a debug kernel config (2s here with this test)

Reproduced quite regularly these last days on the public CI:

The text was updated successfully, but these errors were encountered:

matttbe · 2022-10-11T12:36:44Z

Less regularly, the "server" test fails as well

# FAIL [/opt/packetdrill/gtests/net/mptcp/add_addr/add_addr4_server.pkt (ipv6)]
# stdout: 
# stderr: 
# add_addr4_server.pkt:20: error handling packet: timing error: expected outbound packet at 7.925402 sec but happened at 14.886888 sec; tolerance 2.000000 sec
# script packet:  7.925402 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 198.51.100.2 hmac: 4866319833858278738>
# actual packet: 14.886888 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 198.51.100.2 hmac: 4866319833858278738>

This one is more surprising because the ADD_ADDR is supposed to be sent just after the accept and here it is sent a very long time after! (debug kernel config)

summary - task

pabeni · 2022-10-12T17:46:56Z

It looks like the root cause is the CI being very slow, so that:

No idea how to fix it...

pabeni · 2022-10-18T15:26:54Z

a possible fix would be replacing

+1 <timeout pkt>

with:

+0.9~+1.0 <timeout pkt>

and adjust the latter with the current tolerance with sed, before starting the script

(With the assumption that the timeout must be greater then the tolerance)

On very slow hosts, injected packets cause troubles because it reset timers in Packetdrill, see: multipath-tcp/mptcp_net-next#312 Add a new "safer" version without these injected packets to monitor if everything is OK like that. Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>

ADD_ADDR restransmissions are expected to come ~1s after the last sent packet. In packetdrill, the timing is compared to the last packet. We would like to tell Packetdrill that the packet is expected to come "1s not after the previous packet but the one before" but that's not possible. In this test the last packet before the ADD_ADDR retransmission is not the ADD_ADDR that is sent before but it is the injected and falty echo ADD_ADR. With a (very) slow host, it can takes a bit of time to inject the packet. Because of that, the retransmitted packet can arrive earlier than expected: if the falty echo ADD_ADDR injection takes X sec, the retransmission of the ADD_ADDR by the kernel will take (1 - X) sec. If X is bigger than the tolerance, the test fails which seems to happen regularly on the public CI. add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 3.214190 sec but happened at 2.384757 sec; tolerance 0.800000 sec script packet: 3.214190 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296> actual packet: 2.384757 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296> We should then accept packets being sent less than one second before the injected and falty echo ADD_ADDR. Even worst with a debug kernel config: add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 10.335930 sec but happened at 5.908318 sec; tolerance 2.000000 sec script packet: 10.335930 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620> actual packet: 5.908318 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620> Injecting the falty echo and processing packets sent by the kernel can take time and we have situations where packets arrive a few seconds before the expected time by Packetdrill! Sadly, we cannot tell Packetdrill the packet is expected to be sent in the past. So we need to increment the tolerance a bit. But that's find to do that because a new test has been added in the parent commit: it is focussing on the ADD_ADDR retransmissions without injecting other packets in between. This other test can have stricter expected time. Closes: multipath-tcp/mptcp_net-next#312 Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>

On very slow hosts, injected packets cause troubles because it reset timers in Packetdrill, see: multipath-tcp/mptcp_net-next#312 Add a new "safer" version without these injected packets to monitor if everything is OK like that. Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>

ADD_ADDR restransmissions are expected to come ~1s after the last sent packet. In packetdrill, the timing is compared to the last packet. We would like to tell Packetdrill that the packet is expected to come "1s not after the previous packet but the one before" but that's not possible. In this test the last packet before the ADD_ADDR retransmission is not the ADD_ADDR that is sent before but it is the injected and faulty echo ADD_ADR. With a (very) slow host, it can takes a bit of time to inject the packet. Because of that, the retransmitted packet can arrive earlier than expected: if the faulty echo ADD_ADDR injection takes X sec, the retransmission of the ADD_ADDR by the kernel will take (1 - X) sec. If X is bigger than the tolerance, the test fails which seems to happen regularly on the public CI. add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 3.214190 sec but happened at 2.384757 sec; tolerance 0.800000 sec script packet: 3.214190 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296> actual packet: 2.384757 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296> We should then accept packets being sent less than one second before the injected and faulty echo ADD_ADDR. Even worst with a debug kernel config: add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 10.335930 sec but happened at 5.908318 sec; tolerance 2.000000 sec script packet: 10.335930 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620> actual packet: 5.908318 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620> Injecting the faulty echo and processing packets sent by the kernel can take time and we have situations where packets arrive a few seconds before the expected time by Packetdrill! Sadly, we cannot tell Packetdrill the packet is expected to be sent in the past. So we need to increment the tolerance a bit. But that's find to do that because a new test has been added in the parent commit: it is focussing on the ADD_ADDR retransmissions without injecting other packets in between. This other test can have stricter expected time. Closes: multipath-tcp/mptcp_net-next#312 Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>

matttbe · 2022-10-20T13:38:38Z

This PR should help to reduce the false positives seen by the CI: multipath-tcp/packetdrill#98

On very slow hosts, injected packets cause troubles because it reset timers in Packetdrill, see: multipath-tcp/mptcp_net-next#312 Add a new "safer" version without these injected packets to monitor if everything is OK like that. Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> (cherry picked from commit 59cd5dd) Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>

ADD_ADDR restransmissions are expected to come ~1s after the last sent packet. In packetdrill, the timing is compared to the last packet. We would like to tell Packetdrill that the packet is expected to come "1s not after the previous packet but the one before" but that's not possible. In this test the last packet before the ADD_ADDR retransmission is not the ADD_ADDR that is sent before but it is the injected and faulty echo ADD_ADR. With a (very) slow host, it can takes a bit of time to inject the packet. Because of that, the retransmitted packet can arrive earlier than expected: if the faulty echo ADD_ADDR injection takes X sec, the retransmission of the ADD_ADDR by the kernel will take (1 - X) sec. If X is bigger than the tolerance, the test fails which seems to happen regularly on the public CI. add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 3.214190 sec but happened at 2.384757 sec; tolerance 0.800000 sec script packet: 3.214190 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296> actual packet: 2.384757 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296> We should then accept packets being sent less than one second before the injected and faulty echo ADD_ADDR. Even worst with a debug kernel config: add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 10.335930 sec but happened at 5.908318 sec; tolerance 2.000000 sec script packet: 10.335930 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620> actual packet: 5.908318 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620> Injecting the faulty echo and processing packets sent by the kernel can take time and we have situations where packets arrive a few seconds before the expected time by Packetdrill! Sadly, we cannot tell Packetdrill the packet is expected to be sent in the past. So we need to increment the tolerance a bit. But that's find to do that because a new test has been added in the parent commit: it is focussing on the ADD_ADDR retransmissions without injecting other packets in between. This other test can have stricter expected time. Closes: multipath-tcp/mptcp_net-next#312 Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> (cherry picked from commit 7e14dec) Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>

In case when is64 == 1 in emit(A64_REV32(is64, dst, dst), ctx) the generated insn reverses byte order for both high and low 32-bit words, resuling in an incorrect swap as indicated by the jit test: [ 9757.262607] test_bpf: #312 BSWAP 16: 0x0123456789abcdef -> 0xefcd jited:1 8 PASS [ 9757.264435] test_bpf: #313 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 jited:1 ret 1460850314 != -271733879 (0x5712ce8a != 0xefcdab89)FAIL (1 times) [ 9757.266260] test_bpf: #314 BSWAP 64: 0x0123456789abcdef -> 0x67452301 jited:1 8 PASS [ 9757.268000] test_bpf: #315 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 jited:1 8 PASS [ 9757.269686] test_bpf: #316 BSWAP 16: 0xfedcba9876543210 -> 0x1032 jited:1 8 PASS [ 9757.271380] test_bpf: #317 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 jited:1 ret -1460850316 != 271733878 (0xa8ed3174 != 0x10325476)FAIL (1 times) [ 9757.273022] test_bpf: #318 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe jited:1 7 PASS [ 9757.274721] test_bpf: #319 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 jited:1 9 PASS Fix this by forcing 32bit variant of rev32. Fixes: 1104247 ("bpf, arm64: Support unconditional bswap") Signed-off-by: Artem Savkov <asavkov@redhat.com> Tested-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Xu Kuohai <xukuohai@huawei.com> Message-ID: <20240321081809.158803-1-asavkov@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

matttbe added bug packetdrill labels Oct 11, 2022

matttbe mentioned this issue Oct 20, 2022

mptcp: add_addr: more tolerant with retrans timeout after errors multipath-tcp/packetdrill#98

Merged

matttbe self-assigned this Oct 20, 2022

matttbe mentioned this issue Oct 20, 2022

packetdrill: more tests failing due to packets arriving later than expected #248

Closed

matttbe changed the title ~~packetdrill: add_addr test is regularly failing: packets arriving after the expected time~~ packetdrill: add_addr test is regularly failing: packets arriving before the expected time Oct 20, 2022

dcaratti closed this as completed in multipath-tcp/packetdrill#98 Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

packetdrill: `add_addr` test is regularly failing: packets arriving before the expected time #312

packetdrill: `add_addr` test is regularly failing: packets arriving before the expected time #312

matttbe commented Oct 11, 2022

matttbe commented Oct 11, 2022

pabeni commented Oct 12, 2022

pabeni commented Oct 18, 2022 •

edited by matttbe

Loading

matttbe commented Oct 20, 2022

packetdrill: add_addr test is regularly failing: packets arriving before the expected time #312

packetdrill: add_addr test is regularly failing: packets arriving before the expected time #312

Comments

matttbe commented Oct 11, 2022

matttbe commented Oct 11, 2022

pabeni commented Oct 12, 2022

pabeni commented Oct 18, 2022 • edited by matttbe Loading

matttbe commented Oct 20, 2022

packetdrill: `add_addr` test is regularly failing: packets arriving before the expected time #312

packetdrill: `add_addr` test is regularly failing: packets arriving before the expected time #312

pabeni commented Oct 18, 2022 •

edited by matttbe

Loading