Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

teamtalk 5.8.2 crashes on older processors #1119

Closed
xogium opened this issue Nov 5, 2021 · 20 comments
Closed

teamtalk 5.8.2 crashes on older processors #1119

xogium opened this issue Nov 5, 2021 · 20 comments

Comments

@xogium
Copy link

xogium commented Nov 5, 2021

Hi,
this issue is very similar to #1067 but affects at the very least linux on x86_64, with older processors such as core 2 duo. As soon as people with one of these older processors join an opus channel, their client crashes immediately, the same way it did for mac.

I have no way to test windows with an older cpu to figure out if it also a problem over there.

@bear101
Copy link
Contributor

bear101 commented Nov 6, 2021

Can you download the archive here and see if it also crashes? https://github.com/BearWare/TeamTalk5/actions/runs/1425552219

@xogium
Copy link
Author

xogium commented Nov 6, 2021

Hi,
we've tested with my friend's core 2 duo, still crashing.

@Jookia
Copy link

Jookia commented Nov 6, 2021

Dumping the binary gives something like this:

0000000000496910 <silk_VAD_Init@@Base>:
  496910:       48 c7 07 00 00 00 00    movq   $0x0,(%rdi)
  496917:       48 c7 47 68 00 00 00    movq   $0x0,0x68(%rdi)
  49691e:       00 
  49691f:       48 89 fe                mov    %rdi,%rsi
  496922:       48 8d 7f 08             lea    0x8(%rdi),%rdi
  496926:       48 89 f1                mov    %rsi,%rcx
  496929:       31 c0                   xor    %eax,%eax
  49692b:       66 0f 6f 05 7d 37 e2    movdqa 0xe2377d(%rip),%xmm0        # 12ba0b0 <eMeans@@Base+0x2590>
  496932:       00 
  496933:       48 83 e7 f8             and    $0xfffffffffffffff8,%rdi
  496937:       48 29 f9                sub    %rdi,%rcx
  49693a:       83 c1 70                add    $0x70,%ecx
  49693d:       c1 e9 03                shr    $0x3,%ecx
  496940:       f3 48 ab                rep stos %rax,%es:(%rdi)
  496943:       0f 11 46 5c             movups %xmm0,0x5c(%rsi)
  496947:       66 0f 38 40 05 70 37    pmulld 0xe23770(%rip),%xmm0        # 12ba0c0 <eMeans@@Base+0x25a0>
  49694e:       e2 00 

pmulld is a SSE4.1 instruction and is being run unconditionally, so systems without SSE4.1 are going to crash here.

libopus' source code doesn't use SSE4.1 (or AVX) here on purpose, so it seems like the compiler is doing auto-vectorization using these instructions?

Building from v5.8.2 tag on my machine (Arch Linux, Ryzen 3700X) at least gives assembly like this:

00000000002e90c0 <silk_VAD_Init>:
  2e90c0:       48 c7 07 00 00 00 00    movq   $0x0,(%rdi)
  2e90c7:       48 89 fa                mov    %rdi,%rdx
  2e90ca:       48 8d 7f 08             lea    0x8(%rdi),%rdi
  2e90ce:       31 c0                   xor    %eax,%eax
  2e90d0:       48 83 e7 f8             and    $0xfffffffffffffff8,%rdi
  2e90d4:       48 89 d1                mov    %rdx,%rcx
  2e90d7:       66 0f 6f 05 31 df 0d    movdqa 0xddf31(%rip),%xmm0        # 3c7010 <tiltWeights+0x10>
  2e90de:       00 
  2e90df:       48 29 f9                sub    %rdi,%rcx
  2e90e2:       83 c1 70                add    $0x70,%ecx
  2e90e5:       c1 e9 03                shr    $0x3,%ecx
  2e90e8:       f3 48 ab                rep stos %rax,%es:(%rdi)
  2e90eb:       0f 11 42 3c             movups %xmm0,0x3c(%rdx)
  2e90ef:       66 0f 6f 05 29 df 0d    movdqa 0xddf29(%rip),%xmm0        # 3c7020 <tiltWeights+0x20>
  2e90f6:       00 
  2e90f7:       c7 42 6c 0f 00 00 00    movl   $0xf,0x6c(%rdx)
  2e90fe:       0f 11 42 4c             movups %xmm0,0x4c(%rdx)

I disabled every BUILD_TEAMTALK_ except CORE, disabled all FEATURE_ except opus, disabled all TOOLCHAIN_ except OPUS and ACE.

I had the same result building master. Perhaps there's some build options being set on the GitHub builds?

@Jookia
Copy link

Jookia commented Nov 7, 2021

Here's the code from v1.8.2 that works:

0x00483f10      mov     r8, rdi
0x00483f13      mov     qword [rdi], 0
0x00483f1a      mov     qword [rdi + 0x68], 0
0x00483f22      lea     rdi, [rdi + 8]
0x00483f26      mov     rcx, r8
0x00483f29      xor     eax, eax
0x00483f2b      lea     rsi, [r8 + 0x3c]
0x00483f2f      lea     r9, [r8 + 0x4c]
0x00483f33      and     rdi, 0xfffffffffffffff8
0x00483f37      sub     rcx, rdi
0x00483f3a      add     ecx, 0x70  ; fcn.00000070
0x00483f3d      shr     ecx, 3
0x00483f40      rep     stosq qword [rdi], rax
0x00483f43      movabs  rax, 0x1900000032 ; '2' ; 107374182450
0x00483f4d      mov     ecx, 0x32  ; '2'
0x00483f52      mov     edi, 0x7fffffff
0x00483f57      mov     qword [r8 + 0x5c], rax

No auto-vectorization at all!

v1.8.2 is compiled with GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
v1.8.1 is compiled with GCC: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0

@bear101
Copy link
Contributor

bear101 commented Nov 7, 2021

SSE4.1 can be disabled using a configure option for OPUS: 3bbb812

@Jookia
Copy link

Jookia commented Nov 7, 2021

That commit seemed to break CI? Could you upload a binary for testing?

@bear101
Copy link
Contributor

bear101 commented Nov 7, 2021

Hm, it didn't toggle the options I wanted :( Now it compiles like this:
/usr/bin/cc -DENABLE_HARDENING -DFORTIFY_SOURCE=2 -DHAVE_CONFIG_H -DOPUS_BUILD -DOPUS_HAVE_RTCD -DOPUS_X86_MAY_HAVE_SSE -DOPUS_X86_MAY_HAVE_SSE2 -DOPUS_X86_MAY_HAVE_SSE4_1 -DOPUS_X86_PRESUME_SSE -DOPUS_X86_PRESUME_SSE2 -DVAR_ARRAYS -I/home/runner/work/TeamTalk5/TeamTalk5/Build/build-ubuntu64/Library/TeamTalkLib/build/opus/build/opus/src/opus-src/include -I/home/runner/work/TeamTalk5/TeamTalk5/Build/build-ubuntu64/Library/TeamTalkLib/build/opus/build/opus/src/opus-src-build -I/home/runner/work/TeamTalk5/TeamTalk5/Build/build-ubuntu64/Library/TeamTalkLib/build/opus/build/opus/src/opus-src -I/home/runner/work/TeamTalk5/TeamTalk5/Build/build-ubuntu64/Library/TeamTalkLib/build/opus/build/opus/src/opus-src/celt -I/home/runner/work/TeamTalk5/TeamTalk5/Build/build-ubuntu64/Library/TeamTalkLib/build/opus/build/opus/src/opus-src/silk -I/home/runner/work/TeamTalk5/TeamTalk5/Build/build-ubuntu64/Library/TeamTalkLib/build/opus/build/opus/src/opus-src/silk/float -O3 -DNDEBUG -fPIC -msse -msse2 -msse4.1 -mavx -fstack-protector-strong -MD -MT CMakeFiles/opus.dir/src/opus_multistream.c.o -MF CMakeFiles/opus.dir/src/opus_multistream.c.o.d -o CMakeFiles/opus.dir/src/opus_multistream.c.o -c /home/runner/work/TeamTalk5/TeamTalk5/Build/build-ubuntu64/Library/TeamTalkLib/build/opus/build/opus/src/opus-src/src/opus_multistream.c

I'm trying in this commit now: d1f7642

@bear101
Copy link
Contributor

bear101 commented Nov 7, 2021

Hm, it looks more and more like a bug in OPUS's CMake file. Now it compiles without -mavx and -msse4.1 but OPUS_X86_MAY_HAVE_SSE4_1 is still there:
/usr/bin/cc -DENABLE_HARDENING -DFORTIFY_SOURCE=2 -DHAVE_CONFIG_H -DOPUS_BUILD -DOPUS_HAVE_RTCD -DOPUS_X86_MAY_HAVE_SSE -DOPUS_X86_MAY_HAVE_SSE2 -DOPUS_X86_MAY_HAVE_SSE4_1 -DOPUS_X86_PRESUME_SSE -DOPUS_X86_PRESUME_SSE2 -DVAR_ARRAYS -I/home/runner/work/TeamTalk5/TeamTalk5/Build/build-ubuntu64/Library/TeamTalkLib/build/opus/build/opus/src/opus-src/include -I/home/runner/work/TeamTalk5/TeamTalk5/Build/build-ubuntu64/Library/TeamTalkLib/build/opus/build/opus/src/opus-src-build -I/home/runner/work/TeamTalk5/TeamTalk5/Build/build-ubuntu64/Library/TeamTalkLib/build/opus/build/opus/src/opus-src -I/home/runner/work/TeamTalk5/TeamTalk5/Build/build-ubuntu64/Library/TeamTalkLib/build/opus/build/opus/src/opus-src/celt -I/home/runner/work/TeamTalk5/TeamTalk5/Build/build-ubuntu64/Library/TeamTalkLib/build/opus/build/opus/src/opus-src/silk -I/home/runner/work/TeamTalk5/TeamTalk5/Build/build-ubuntu64/Library/TeamTalkLib/build/opus/build/opus/src/opus-src/silk/float -O3 -DNDEBUG -fPIC -msse -msse2 -fstack-protector-strong -MD -MT CMakeFiles/opus.dir/src/opus.c.o -MF CMakeFiles/opus.dir/src/opus.c.o.d -o CMakeFiles/opus.dir/src/opus.c.o -c /home/runner/work/TeamTalk5/TeamTalk5/Build/build-ubuntu64/Library/TeamTalkLib/build/opus/build/opus/src/opus-src/src/opus.c

@Jookia
Copy link

Jookia commented Nov 7, 2021

The MAY_HAVE_SSE4_1 is fine I think as that's runtime detection? The problem is that the compiler needs -msse4.1 and -mavx to support building runtime SIMD for these instruction sets.

I enabled SSE4_1_SUPPORTED and AVX_SUPPORTED and added this line to the opus cmakelists.txt: add_definitions("-fno-tree-vectorize")

This stops GCC from adding SIMD when not asked to, at least with current versions of GCC. A longer term solution is fixing the Opus code to only use SIMD when it needs to, not say 'I have SIMD, automatically generate code for it in addition to my hand-writen code'

@Jookia
Copy link

Jookia commented Nov 7, 2021

Looking at upstream libopus, it's CMakeLists does in fact only add -msse and friends to individual files. Commit 927de8453c502586c03e25c169ec08f2a93ebc02 fixes this. Upstream bug for this issue is xiph/opus#154 and has been fixed.

@Jookia
Copy link

Jookia commented Nov 7, 2021

Some other project also hit this exact bug: vircadia/vircadia-native-core#429

@Jookia
Copy link

Jookia commented Nov 7, 2021

Sorry for the avalanche of notifications. I think by upgrading to unreleased opus should allow you to fix this issue and avoid micromanaging AVX/SSE flags to it. Short term the -fno-tree-vectorize should have the same effect.

Edit: I took apart the Mac binaries too, the one that works (build 5055) has this code where it crashes:

_silk_VAD_Init (int64_t arg1);
; arg int64_t arg1 @ rdi
0x001cd5b0      push    rbp
0x001cd5b1      mov     rbp, rsp
0x001cd5b4      xorps   xmm0, xmm0
0x001cd5b7      movups  xmmword [rdi + 0x60], xmm0 ; arg1
0x001cd5bb      movups  xmmword [rdi + 0x50], xmm0 ; arg1
0x001cd5bf      movups  xmmword [rdi + 0x40], xmm0 ; arg1
0x001cd5c3      movups  xmmword [rdi + 0x30], xmm0 ; arg1
0x001cd5c7      movups  xmmword [rdi + 0x20], xmm0 ; arg1
0x001cd5cb      movups  xmmword [rdi + 0x10], xmm0 ; arg1
0x001cd5cf      movups  xmmword [rdi], xmm0 ; arg1
0x001cd5d2      movaps  xmm0, xmmword [0x018a7e30]
0x001cd5d9      movups  xmmword [rdi + 0x5c], xmm0 ; arg1
0x001cd5dd      movaps  xmm0, xmmword [0x018a7e40]
0x001cd5e4      movups  xmmword [rdi + 0x3c], xmm0 ; arg1
0x001cd5e8      movaps  xmm0, xmmword [0x018a7e50]
0x001cd5ef      movups  xmmword [rdi + 0x4c], xmm0 ; arg1
0x001cd5f3      mov     dword [rdi + 0x6c], 0xf ; arg1
0x001cd5fa      add     rdi, 0x28  ; fcn.00000028 ; arg1
0x001cd5fe      lea     rsi, [0x018a7e60]
0x001cd605      mov     edx, 0x10  ; fcn.00000010
0x001cd60a      call    memset_pattern16 ; sym.imp.memset_pattern16
0x001cd60f      xor     eax, eax
0x001cd611      pop     rbp
0x001cd612      ret
0x001cd613      nop     word cs:[rax + rax]
0x001cd61d      nop     dword [rax]

which seems to not have any SSE4.1 or AVX, while the crashing code has this:

  ;-- func.001ce7c0:
_silk_VAD_Init (int64_t arg1);
; arg int64_t arg1 @ rdi
0x001ce7c0      push    rbp
0x001ce7c1      mov     rbp, rsp
0x001ce7c4      vxorps  xmm0, xmm0, xmm0
0x001ce7c8      vmovups ymmword [rdi + 0x50], ymm0
0x001ce7cd      vmovups ymmword [rdi + 0x40], ymm0
0x001ce7d2      vmovups ymmword [rdi + 0x20], ymm0
0x001ce7d7      vmovups ymmword [rdi], ymm0
0x001ce7db      vmovaps xmm0, xmmword [0x018a7fd0]
0x001ce7e3      vmovups xmmword [rdi + 0x5c], xmm0
0x001ce7e8      vmovaps ymm0, ymmword [0x018a8000]
0x001ce7f0      vmovups ymmword [rdi + 0x3c], ymm0
0x001ce7f5      mov     dword [rdi + 0x6c], 0xf ; arg1
0x001ce7fc      add     rdi, 0x28  ; fcn.00000028 ; arg1
0x001ce800      lea     rsi, [0x018a7fe0]
0x001ce807      mov     edx, 0x10  ; fcn.00000010
0x001ce80c      vzeroupper
0x001ce80f      call    memset_pattern16 ; sym.imp.memset_pattern16
0x001ce814      xor     eax, eax
0x001ce816      pop     rbp
0x001ce817      ret
0x001ce818      nop     dword [rax + rax]

You can see how it uses AVX ymm registers over SSE xmm registers.

@bear101
Copy link
Contributor

bear101 commented Nov 7, 2021

The MAY_HAVE_SSE4_1 is fine I think as that's runtime detection? The problem is that the compiler needs -msse4.1 and -mavx to support building runtime SIMD for these instruction sets.

If OPUS is compiled with -mavx then the compiler will generate code with AVX instructions, also non-assembly code. I.e. "normal" C-code will contain AVX instructions and cause compatibility problems for non-AVX CPUs.

@Jookia
Copy link

Jookia commented Nov 7, 2021 via email

@bear101
Copy link
Contributor

bear101 commented Nov 7, 2021

So what CMake options would you provide here to make it work on Core2 Duo?

@Jookia
Copy link

Jookia commented Nov 8, 2021

I'd remove the '-DSSE4_1_SUPPORTED=OFF -DAVX_SUPPORTED=OFF', then in the opus git CmakeLists.txt in your other repo add: add_definitions("-fno-tree-vectorize") This will stop the compiler from automatically emitting SIMD instructions.

Alternatively, I'd remove the '-DSSE4_1_SUPPORTED=OFF -DAVX_SUPPORTED=OFF', then upgrade to opus git master which has a better fix where it only uses -mavx on its AVX files and not the rest of the projects, same with -msse. See xiph/opus#154

@bear101 bear101 added this to the TeamTalk v5.8.3 milestone Nov 9, 2021
@bear101
Copy link
Contributor

bear101 commented Jan 4, 2022

@xogium If you run this Linux client application do you still experience the crash:
https://github.com/BearWare/TeamTalk5/actions/runs/1650821580
It's build artifacts from GibHub Actions.

@xogium
Copy link
Author

xogium commented Jan 10, 2022

Hi,

unfortunately I haven't been able to get my friend to test this yet.

I'll try and get him asap.

Thanks !

@Jookia
Copy link

Jookia commented Jan 10, 2022 via email

@bear101
Copy link
Contributor

bear101 commented Jan 12, 2022

Reopen if it's still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants