-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why do we use -fno-inline-functions? #5212
Comments
Just a test, see thesofproject#5212 Signed-off-by: Marc Herbert <marc.herbert@intel.com>
From @andyross about Zephyr:
|
#5213 test removes the flag and shows "typical" failures, not worse than usual. |
This was based on historical XCC settings so will need more analysis for code size, perf and stability. |
cc: @aborisovich , @mmaka1 , @mwasko , @abonislawski |
Ok so looking at standards and citating latest C99 documentation WG14 N1256:
Well maybe we have a lot of definitions of modifiable objects with static storage duration inside a functions that have a external linkage? It seems true to me as there are tons of macros defining some static structs in the function bodies? Next thing its probably so called "i need to control everything C programmer habits".
As far as I know there had been tons of effort invested into compilers so they can effectively make decisions on what is more effective - inlining a function or creating a new stack frame. That being said, I am not sure how much of this effort had been put into the "old standards" (which C99 certainly is). We could probably try to approach it practically by simple performance comparisons. warning - here goes my personal and subjective opinion - everyone has right for his own |
@marc-hb what are the CI, validation, size and perf results of your test PR ? No problem to remove, my understanding this was needed for historical XCC (which we no longer use). |
Just a test, see thesofproject#5212 Signed-off-by: Marc Herbert <marc.herbert@intel.com>
I tried to "quickly" drop the [*] I hate negative options; they always lead to double negations. Next I wanted to diffoscope a couple .c.obj file to find clues. However finding quickly which .c.obj files to diffoscope in the first place and removing a ton of noise requires dropping -g flags first. Dropping -g is easy enough in SOF (done it many time)s but In Zephyr I tried the following hack which gave me even weirder results: dropping July 2023 EDIT: maybe it was something like this old This is way out of the "let's have a quick look" territory now, this requires real, scheduled work. I didn't even look at XCC. --- a/cmake/compiler/gcc/compiler_flags.cmake
+++ b/cmake/compiler/gcc/compiler_flags.cmake
@@ -163,11 +163,11 @@ check_set_compiler_property(APPEND PROPERTY hosted -fno-freestanding)
set_compiler_property(PROPERTY freestanding -ffreestanding)
# Flag to enable debugging
-set_compiler_property(PROPERTY debug -g)
+# set_compiler_property(PROPERTY debug -g)
# GCC 11 by default emits DWARF version 5 which cannot be parsed by
# pyelftools. Can be removed once pyelftools supports v5.
-check_set_compiler_property(APPEND PROPERTY debug -gdwarf-4)
+#check_set_compiler_property(APPEND PROPERTY debug -gdwarf-4)
set_compiler_property(PROPERTY no_common -fno-common) @@ -129,7 +129,7 @@
fix_elf_addrs.py: Moving section k_sem_area to cached SRAM region
fix_elf_addrs.py: Moving section .bss to cached SRAM region
in current dir: /p5/build/SOF/sof; running command: /home/mherber2/zephyr-west/build-apl/zephyr/smex_ep/build/smex -l /home/mherber2/zephyr-west/build-sof-staging/sof/sof-apl.ldc /home/mherber2/zephyr-west/build-apl/zephyr/zephyr.elf
- Found 47 sections, listing valid sections......
+ Found 40 sections, listing valid sections......
No Start End Size Type Name
1 0xb000a000 0xb000a36b 0x36b TEXT .imr
2 0xbe008000 0xbe00816a 0x16a TEXT .WindowVectors.text
@@ -142,7 +142,7 @@
9 0xbe008300 0xbe008303 0x3 TEXT .KernelExceptionVector.text
10 0xbe008340 0xbe008352 0x12 TEXT .UserExceptionVector.text
11 0xbe0083c0 0xbe0083c6 0x6 TEXT .DoubleExceptionVector.text
- 12 0xbe008400 0xbe0337aa 0x2b3aa TEXT .text
+ 12 0xbe008400 0xbe0339a6 0x2b5a6 TEXT .text
13 0xbe034000 0xbe044c28 0x10c28 DATA .rodata
14 0xbe044c28 0xbe044c80 0x58 DATA initlevel
15 0xbe044c80 0xbe044ce0 0x60 DATA devices
@@ -163,8 +163,8 @@
30 0x000208a0 0x00036780 0x15ee0 DATA .static_log_entries
31 0x00036780 0x00036a50 0x2d0 DATA .fw_metadata
- module: input size 387148 (0x5e84c) bytes 31 sections
- module: text 178576 (0x2b990) bytes
+ module: input size 387656 (0x5ea48) bytes 31 sections
+ module: text 179084 (0x2bb8c) bytes
data 208572 (0x32ebc) bytes
bss 5312 (0x14c0) bytes
@@ -199,7 +199,7 @@
Module Reading /p5/build/SOF/zephyr-west/build-apl/zephyr/main.mod
- Found 45 sections, listing valid sections......
+ Found 38 sections, listing valid sections......
No LMA VMA End Size Type Name
1 0xbe008000 0xbe008000 0xbe00816a 0x16a TEXT .WindowVectors.text
2 0xbe008180 0xbe008180 0xbe0081a9 0x29 TEXT .Level2InterruptVector.text
@@ -211,7 +211,7 @@
8 0xbe008300 0xbe008300 0xbe008303 0x3 TEXT .KernelExceptionVector.text
9 0xbe008340 0xbe008340 0xbe008352 0x12 TEXT .UserExceptionVector.text
10 0xbe0083c0 0xbe0083c0 0xbe0083c6 0x6 TEXT .DoubleExceptionVector.text
- 11 0xbe008400 0xbe008400 0xbe0337aa 0x2b3aa TEXT .text
+ 11 0xbe008400 0xbe008400 0xbe0339a6 0x2b5a6 TEXT .text
12 0xbe034000 0xbe034000 0xbe044c28 0x10c28 DATA .rodata
13 0xbe044c28 0xbe044c28 0xbe044c80 0x58 DATA initlevel
14 0xbe044c80 0xbe044c80 0xbe044ce0 0x60 DATA devices
@@ -227,8 +227,8 @@
24 0xbe045c80 0xbe045c80 0xbe050480 0xa800 DATA .cached
25 0xbe051000 0xbe051000 0xbe0524c0 0x14c0 BSS .bss
- module: input size 293521 (0x47a91) bytes 25 sections
- module: text 177701 (0x2b625) bytes
+ module: input size 294029 (0x47c8d) bytes 25 sections
+ module: text 178209 (0x2b821) bytes
data 115820 (0x1c46c) bytes
bss 5312 (0x14c0) bytes
@@ -251,7 +251,7 @@
Entry point 0xbe008400
Totals Start End Size
- TEXT 0xbe008000 0xbe0337aa 0x2b7aa
+ TEXT 0xbe008000 0xbe0339a6 0x2b9a6
DATA 0xbe034000 0xbe050480 0x1c480
BSS 0xbe051000 0xbe0524c0 0x14c0
@@ -266,7 +266,7 @@
8 0xbe008300 0x3 0x9300 TEXT
9 0xbe008340 0x12 0x9340 TEXT
10 0xbe0083c0 0x6 0x93c0 TEXT
- 11 0xbe008400 0x2b3aa 0x9400 TEXT
+ 11 0xbe008400 0x2b5a6 0x9400 TEXT
12 0xbe034000 0x10c28 0x35000 DATA
13 0xbe044c28 0x58 0x45c28 DATA
14 0xbe044c80 0x60 0x45c80 DATA |
That does not seem to be a problem, see test PRs #5213 and #5645
Yes, the
However
AFAICT these complex C89/C99/C++ linkage rules affect only the non-static inline functions that no ever uses. I've only seen I don't think optimizers ever care about linkage or standards. They hopefully don't have to.
Yes, the only way to know for sure is to try and test. Afraid this looks like hard work though.
ALWAYS_INLINE is only used in shared Zephyr code, not in SOF code.
Great summary of the main problem with "everything and the kitchen sink" C++. Many features are awesome but others undesirable in embedded. Where to draw the line considering super scarce code review bandwidth? Trust the developers! Just kidding of course :-)
"at least new ones". Xtensa toolchains are co-generated when the chip is designed, that's the whole Xtensa "value proposition" and why they have funny names. Then it's a huge amount of work to upgrade the toolchain for older products. I think it's already been discussed but never happened AFAIK. In the real world: https://github.com/zephyrproject-rtos/zephyr/pulls?q=is%3Apr+xcc zephyrproject-rtos/zephyr#42972, 42939, 42494, 42017, 42012, ... @dcpleung any expert XCC opinion to share? |
AFAIK, the GCC portion of XCC can only support C89 in terms of inlining (specially, the GNU89 inlining semantics), unless they are no longer based on GCC 4.2. IIRC, some of the inlining semantics are actually reversed between GNU89 and C99. So one has to be really careful there. Though, if we are going to use XCC-Clang, none of this would be an issue. :) |
@marc-hb lets revisit this once we have fully aligned and stabilised with Zephyr. This interrupts the convergence flow atm and its a good thing to investigate once we have all native Zephyr APIs in use and stable. |
cc: @andrula-song , @singalsu , @lyakh , @ShriramShastry |
Add reference to -fno-inline-functions issue thesofproject#5212 Add examples of compilation failures without gnu99. Fixes commit e430629 ("xt-clang: Do not use -std=gnu99 for C++") that separated the comment from its code. Signed-off-by: Marc Herbert <marc.herbert@intel.com>
This is popping up again in optimization work. The RI-2020.5 xcc help says:
So basicly this is ensuring inlining is explicitly controlled and not decided by the compiler, and very specific to XCC. |
@kv2019i are you sure it's xcc-specific? https://developer.arm.com/documentation/dui0774/e/Compiler-Command-line-Options/-fno-inline-functions and actually also in https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html :
So, logically, |
FWIW XCC's front-end = gcc. I don't remember seeing any XCC specific command line option. Interesting post from Ian Lance Taylor shared by @lyakh somewhere else
|
I post one reason to keep -fno-inline-functions as build option. we can inline functions as we want to optimize performance, if let compiler to decide inline all possible functions to inline, then may be cause one function is extremely large and have potential issues and extra complex at asm level(like much more goto added). So I would suggest keep it and close this issue. |
How much? Last time I checked (a long time ago) it made very little difference.
What issues?
Optimized machine code stopped looking like its corresponding C source code a long time ago. It's not just inlining. |
I think in most cases the compiler is better at optimising such things for performance than us.
That does indeed make it a bit more difficult - if you have your C code with several functions, but the compiler merges them all into one, debugging that at assembly level can indeed become more difficult. But IMHO that's a price we can pay for probable optimisation gains. |
we can only inline time critical related(like in process/copy), and for prepare and params, keep as separate does not have harm on performance. |
Relating optimized assembly with its corresponding source is indeed a lost cause. So that's not a consideration. https://sourceware.org/gdb/onlinedocs/gdb/Optimized-Code.html etc. |
I made a test and get above data, origin is left, and remove -fno-inline-functions is right, you can see, that after inline allowed, tgl sof txt size increase about 1k, and check further performance data, you can see only mixout improved a lot, this is because mixout_process inside have some inlined function. so conclusion is: inline can improve performance when there is inline function in timing critical path, penality is: small txt size increase:0.2%. |
This explain why I've seen some small MCPS improvements from explicitly adding inline to some functions. But in code reviews some of those have got removed with assumption that compiler does the same work. If we keep this we should allow and use in code manual inline for obvious optimizations. |
@btian1 I would submit a PR that removes the CC ""no-inline" option via Kconfig as some older versions fo XCC required this (and I would disable for all Intel platforms as IIUC it was needed around APL era). |
@lgirdwood , do you mean remove -fno-line-functions in intel tgl mtl and afterwards platforms, and keep it with old platforms for XCC build? |
yes, this would allow other to validate thier CC version for each platform handles inline correctly. |
Please review as you expected or not, @lgirdwood |
closing this issue with #8033 merge. |
-fno-inline-functions was added in 2016 by giant commit c0dfb4e
core: initial import of open source DSP firmware
Then it was transferred from automake to CMake in 2019 by commit e0ba98d
cmake: add CMakeLists for firmware build
It was copied to the Zephyr build by initial Zephyr commit de41202
None of these explains why.
The text was updated successfully, but these errors were encountered: