Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault on playing pcap file ( sipp 3.7.2 ) #673

Closed
gitkoyot opened this issue Dec 11, 2023 · 11 comments · Fixed by #698
Closed

segmentation fault on playing pcap file ( sipp 3.7.2 ) #673

gitkoyot opened this issue Dec 11, 2023 · 11 comments · Fixed by #698

Comments

@gitkoyot
Copy link

Hello,
I have a scenario where both UAC and UAS are playing 10 sec audio. I noticed that sometimes UAS part crashes. When I executed it with valgrind this is what I found:

==6613== Thread 2:
==6613== Invalid read of size 2
==6613==    at 0x185541: send_packets (send_packets.c:241)
==6613==    by 0x13936F: send_wrapper(void*) (call.cpp:6824)
==6613==    by 0x53183EB: start_thread (pthread_create.c:444)
==6613==    by 0x539896F: clone (clone.S:100)
==6613==  Address 0x647d988 is 8 bytes inside a block of size 40 free'd
==6613==    at 0x48431EF: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==6613==    by 0x1170B6: CAction::setPcapArgs(char const*) (actions.cpp:536)
==6613==    by 0x136152: call::executeAction(char const*, message*) (call.cpp:6083)
==6613==    by 0x12456E: call::executeMessage(message*) (call.cpp:1900)
==6613==    by 0x1257E4: call::run() (call.cpp:2214)
==6613==    by 0x1A69F1: traffic_thread(int&, int&) (sipp.cpp:589)
==6613==    by 0x1AB53C: main (sipp.cpp:2147)
==6613==  Block was alloc'd at
==6613==    at 0x4840808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==6613==    by 0x1170DA: CAction::setPcapArgs(char const*) (actions.cpp:541)
==6613==    by 0x136152: call::executeAction(char const*, message*) (call.cpp:6083)
==6613==    by 0x12456E: call::executeMessage(message*) (call.cpp:1900)
==6613==    by 0x1257E4: call::run() (call.cpp:2214)
==6613==    by 0x1A69F1: traffic_thread(int&, int&) (sipp.cpp:589)
==6613==    by 0x1AB53C: main (sipp.cpp:2147)

also stack trace points to similar

                                                                                                                                        Thread 11 "sipp" received signal SIGSEGV, Segmentation fault.
                                                        [Switching to Thread 0x7ffff24bd6c0 (LWP 7449)]
 ...

(gdb) bt full
#0  0x00005555555d1502 in send_packets (play_args=0x555555a94a90) at ./src/send_packets.c:240
        __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {140737258446044, 6969434371858591346, -50016, 0, 140737488343392, 
                140737250054144, 6969434371602738802, 3885807745694920306}, __mask_was_saved = 0}}, __pad = {0x7ffff24b0400, 0x0, 0x0, 0x0}}
        __cancel_routine = 0x5555555d1072 <send_packets_cleanup>
        __cancel_arg = 0x7ffff24b0314
        __not_first_call = 0
        pkt_max = 0x555555acc118
        from_port = 0x555555a94b1a
        didsleep = {tv_sec = 0, tv_usec = 0}
        pkt_index = 0x0
        temp_sum = 0
        port_diff = 0
        to_port = 0x555555a94a9a
        start = {tv_sec = 0, tv_usec = 0}
        pkts = 0x555555a00d20
        bind_addr = {ss_family = 2, __ss_padding = "\000\000\177\000\001\001", '\000' <repeats 111 times>, __ss_align = 0}
        ret = 0
        from6 = {sin6_family = 0, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, 
              __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}
        sock = 27
        last = {tv_sec = 0, tv_usec = 0}
        to = 0x555555a94a98
        from = 0x555555a94b18
        udp = 0x7ffff24b0560
        to6 = {sin6_family = 0, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, 
              __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}
        buffer = '\000' <repeats 1499 times>
        len = 16
        __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {140737258446044, 6969434371858591346, -50016, 0, 140737488343392, 
                140737250054144, 6969434371602738802, 3885807745510764146}, __mask_was_saved = 0}}, __pad = {0x7ffff24b0bb0, 0x0, 0x0, 0x0}}
        __cancel_routine = 0x5555555d109a <send_packets_pcap_cleanup>
        __cancel_arg = 0x555555a94a90
        __not_first_call = 0
#1  0x0000555555585370 in send_wrapper (arg=0x555555a94a90) at ./src/call.cpp:6824
        s = 0x555555a94a90
#2  0x00007ffff742b3ec in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737341731120, -6969457636240036238, -50016, 0, 140737488343392, 140737250054144, 
                6969434371883757170, 6969441033572544114}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, 
              cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#3  0x00007ffff74aba4c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

i suspect (and I might be wrong) that the issue is that the call is shorter than the audio file. Or there is another race condition or my scripts (attached in separate message) are wrong.

@gitkoyot
Copy link
Author

I tried to extract issue to following scenarios. When call rate is set to 10CPS - everything works on my machine. When I switch to 100CPS it crashes

g711PCMUstream.zip
uac.sh.txt
uac.xml.txt
uas.sh.txt
uas.xml.txt
valgrind-uas.txt.txt

please dont mind uac crashing - it crashes because connection is reset by peer

@man1207
Copy link

man1207 commented Feb 16, 2024

Hi! Observe the same behaviour

<nop> <action> <exec play_pcap_audio="g711a.pcap" /> </action> </nop>

This crushes sipp with segmentation fault when increase CPS over 35-40, concurrent calls ~ 1500.

@peter-oneill
Copy link
Contributor

peter-oneill commented Mar 19, 2024

Caused by use-after-free, introduced by d8899fc under #521

I'll look into a fix

@peter-oneill
Copy link
Contributor

Suspect a duplicate of #610, #576

@peter-oneill
Copy link
Contributor

peter-oneill commented Mar 19, 2024

@orgads I'm considering whether the best solution here might be to revert #521

The high-level problem with the PR is that it changes the pcap play and RTP stream actions to hold per-call variables, but without changing the underlying control blocks to represent that. Instead, the action that is still in use for previous calls are partially freed before being altered and reused for a new call. The issues I've identified coming from this are:

  1. segfaults - see this and other linked issues
  2. other use-after-free (the issue I've hit) - where the source and destination port for an existing stream can be altered when a new stream is started
  3. scalability - previously all calls with PCAP play would just iterate through the same set of packet buffers, now even when using the same PCAP file, every call will create its own set of packet buffers.
  4. memory leak - the buffers allocated in the item above are never freed.

How crucial is #521 to you? I count at least 6 users from the issues (me included) who have hit this, and it seems to me that the best way to resolve this for now would be to remove the PR, either pending a suitable reworking to resolve the issues above, or indefinitely if the feature isn't important enough.

@orgads
Copy link
Contributor

orgads commented Mar 19, 2024

I need it. We use it all the time.

Is it too hard to fix?

@peter-oneill
Copy link
Contributor

peter-oneill commented Mar 20, 2024

Point 3 above isn't fixable - it turns the PCAP handling from O(1) to O(N), where N=concurrent calls. It intrinsically changes the scalability of running SIPp with media.

The other three points (1,2,4) would be resolvable, but it's at the scale of the fix being bigger than the initial code changes.

I'm surprised you don't have issues; what sort of scenarios are you running? Do you run concurrent media streams using PCAP play?

@orgads
Copy link
Contributor

orgads commented Mar 20, 2024

Is it only for pcap? we use wav streaming.

@orgads
Copy link
Contributor

orgads commented Mar 20, 2024

You may revert the change. I'll try to look into it if I find the time for it.

@peter-oneill
Copy link
Contributor

Looks like WAV files use RTPStream rather than PCAP play? When I looked at the effect of the code changes, it looks like RTPStream doesn't suffer the same issues as the PCAP play: Some logic in rtpstream.cpp suggests each individual file is cached (rather than one per call even if they're the same), that the same use-after-free issues are avoided.

I could remove the feature for PCAP play without removing for RTPStream?

@orgads
Copy link
Contributor

orgads commented Mar 20, 2024

Sure, thank you.

peter-oneill added a commit to peter-oneill/sipp that referenced this issue Mar 20, 2024
peter-oneill added a commit to peter-oneill/sipp that referenced this issue Mar 20, 2024
peter-oneill added a commit to peter-oneill/sipp that referenced this issue Mar 20, 2024
peter-oneill added a commit to peter-oneill/sipp that referenced this issue Mar 21, 2024
@orgads orgads linked a pull request Mar 22, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants