Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do we use -fno-inline-functions? #5212

Closed
marc-hb opened this issue Jan 14, 2022 · 26 comments
Closed

Why do we use -fno-inline-functions? #5212

marc-hb opened this issue Jan 14, 2022 · 26 comments
Assignees
Labels
enhancement New feature or request P3 Low-impact bugs or features

Comments

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 14, 2022

-fno-inline-functions was added in 2016 by giant commit c0dfb4e core: initial import of open source DSP firmware

Then it was transferred from automake to CMake in 2019 by commit e0ba98d cmake: add CMakeLists for firmware build

It was copied to the Zephyr build by initial Zephyr commit de41202

None of these explains why.

@marc-hb marc-hb added the enhancement New feature or request label Jan 14, 2022
marc-hb added a commit to marc-hb/sof that referenced this issue Jan 14, 2022
Just a test, see thesofproject#5212

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator Author

marc-hb commented Jan 14, 2022

From @andyross about Zephyr:

FWIW: within Zephyr, almost all the spots where we're inlining for performance (or, very occasionally, correctness) are tagged ALWAYS_INLINE (which expands to the always_inline gcc attribute), and won't be affected. Still, seems like something that should be on absent weird compiler bugs.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jan 14, 2022

#5213 test removes the flag and shows "typical" failures, not worse than usual.

@lgirdwood
Copy link
Member

This was based on historical XCC settings so will need more analysis for code size, perf and stability.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Mar 30, 2022

cc: @aborisovich , @mmaka1 , @mwasko , @abonislawski

@aborisovich
Copy link
Contributor

aborisovich commented Apr 4, 2022

Ok so looking at standards and citating latest C99 documentation WG14 N1256:
6.7.4 § 3:

An inline definition of a function with external linkage shall not contain a definition of a
modifiable object with static storage duration, and shall not contain a reference to an
identifier with internal linkage

Well maybe we have a lot of definitions of modifiable objects with static storage duration inside a functions that have a external linkage? It seems true to me as there are tons of macros defining some static structs in the function bodies?

Next thing its probably so called "i need to control everything C programmer habits".
Standard says 6.7.4 § 5:

(...) Making a function an inline function suggests that calls to the function be as
fast as possible. The extent to which such suggestions are effective is
implementation-defined.

As far as I know there had been tons of effort invested into compilers so they can effectively make decisions on what is more effective - inlining a function or creating a new stack frame. That being said, I am not sure how much of this effort had been put into the "old standards" (which C99 certainly is).

We could probably try to approach it practically by simple performance comparisons.
Let's just try to turn this off (-fno-inline-functions flag), remove those ALWAYS_INLINE (replace then with stadard "inline" keyword hint) and see what a result will be.

warning - here goes my personal and subjective opinion - everyone has right for his own
I do not support using C for firmware development at all. Software of this complexity should be written using modern C++ language taking the tools we need and excluding the tools we should not use. The fact that we use C99 standard is another fiasco. Current C language standard is ISO/IEC 9899:2018. Xtensa toolchains support it for some time now (at least new ones).
I do understand that we may not benefit from C standard additions to standard library because we do not use a standard library, but there had been tons of bug fixes as far as I am aware and modern C standard should be used even if we do not
need "new features". We could make use of native alignof and alignas operators, xtensa newlibc probably defines those.

@lgirdwood
Copy link
Member

@marc-hb what are the CI, validation, size and perf results of your test PR ? No problem to remove, my understanding this was needed for historical XCC (which we no longer use).

marc-hb added a commit to marc-hb/sof that referenced this issue Apr 4, 2022
Just a test, see thesofproject#5212

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 4, 2022

I tried to "quickly" drop the -fno-inline-functions and compare binaries while compiling with the Zephyr SDK and I observed a strange and counter-intuitive[*] the expected result: dropping the flag increased the TEXT size a bit.

[*] I hate negative options; they always lead to double negations.

Next I wanted to diffoscope a couple .c.obj file to find clues. However finding quickly which .c.obj files to diffoscope in the first place and removing a ton of noise requires dropping -g flags first. Dropping -g is easy enough in SOF (done it many time)s but In Zephyr I tried the following hack which gave me even weirder results: dropping -g like this does far more than dropping the debug sections, dropping -g also changes the generated code! How come? Some post-processing script maybe?

July 2023 EDIT: maybe it was something like this old -g bug

This is way out of the "let's have a quick look" territory now, this requires real, scheduled work. I didn't even look at XCC.

--- a/cmake/compiler/gcc/compiler_flags.cmake
+++ b/cmake/compiler/gcc/compiler_flags.cmake
@@ -163,11 +163,11 @@ check_set_compiler_property(APPEND PROPERTY hosted -fno-freestanding)
 set_compiler_property(PROPERTY freestanding -ffreestanding)
 
 # Flag to enable debugging
-set_compiler_property(PROPERTY debug -g)
+# set_compiler_property(PROPERTY debug -g)
 
 # GCC 11 by default emits DWARF version 5 which cannot be parsed by
 # pyelftools. Can be removed once pyelftools supports v5.
-check_set_compiler_property(APPEND PROPERTY debug -gdwarf-4)
+#check_set_compiler_property(APPEND PROPERTY debug -gdwarf-4)
 
 set_compiler_property(PROPERTY no_common -fno-common)
@@ -129,7 +129,7 @@
 fix_elf_addrs.py: Moving section k_sem_area to cached SRAM region
 fix_elf_addrs.py: Moving section .bss to cached SRAM region
 in current dir: /p5/build/SOF/sof; running command: /home/mherber2/zephyr-west/build-apl/zephyr/smex_ep/build/smex -l /home/mherber2/zephyr-west/build-sof-staging/sof/sof-apl.ldc /home/mherber2/zephyr-west/build-apl/zephyr/zephyr.elf
-  Found 47 sections, listing valid sections......
+  Found 40 sections, listing valid sections......
 	No	Start		End		Size	Type	Name
 	1	0xb000a000	0xb000a36b	0x36b	TEXT	.imr
 	2	0xbe008000	0xbe00816a	0x16a	TEXT	.WindowVectors.text
@@ -142,7 +142,7 @@
 	9	0xbe008300	0xbe008303	0x3	TEXT	.KernelExceptionVector.text
 	10	0xbe008340	0xbe008352	0x12	TEXT	.UserExceptionVector.text
 	11	0xbe0083c0	0xbe0083c6	0x6	TEXT	.DoubleExceptionVector.text
-	12	0xbe008400	0xbe0337aa	0x2b3aa	TEXT	.text
+	12	0xbe008400	0xbe0339a6	0x2b5a6	TEXT	.text
 	13	0xbe034000	0xbe044c28	0x10c28	DATA	.rodata
 	14	0xbe044c28	0xbe044c80	0x58	DATA	initlevel
 	15	0xbe044c80	0xbe044ce0	0x60	DATA	devices
@@ -163,8 +163,8 @@
 	30	0x000208a0	0x00036780	0x15ee0	DATA	.static_log_entries
 	31	0x00036780	0x00036a50	0x2d0	DATA	.fw_metadata
 
- module: input size 387148 (0x5e84c) bytes 31 sections
- module: text 178576 (0x2b990) bytes
+ module: input size 387656 (0x5ea48) bytes 31 sections
+ module: text 179084 (0x2bb8c) bytes
     data 208572 (0x32ebc) bytes
     bss  5312 (0x14c0) bytes
 
@@ -199,7 +199,7 @@
 
 
 Module Reading /p5/build/SOF/zephyr-west/build-apl/zephyr/main.mod
-  Found 45 sections, listing valid sections......
+  Found 38 sections, listing valid sections......
 	No	LMA		VMA		End		Size	Type	Name
 	1	0xbe008000	0xbe008000	0xbe00816a	0x16a	TEXT	.WindowVectors.text
 	2	0xbe008180	0xbe008180	0xbe0081a9	0x29	TEXT	.Level2InterruptVector.text
@@ -211,7 +211,7 @@
 	8	0xbe008300	0xbe008300	0xbe008303	0x3	TEXT	.KernelExceptionVector.text
 	9	0xbe008340	0xbe008340	0xbe008352	0x12	TEXT	.UserExceptionVector.text
 	10	0xbe0083c0	0xbe0083c0	0xbe0083c6	0x6	TEXT	.DoubleExceptionVector.text
-	11	0xbe008400	0xbe008400	0xbe0337aa	0x2b3aa	TEXT	.text
+	11	0xbe008400	0xbe008400	0xbe0339a6	0x2b5a6	TEXT	.text
 	12	0xbe034000	0xbe034000	0xbe044c28	0x10c28	DATA	.rodata
 	13	0xbe044c28	0xbe044c28	0xbe044c80	0x58	DATA	initlevel
 	14	0xbe044c80	0xbe044c80	0xbe044ce0	0x60	DATA	devices
@@ -227,8 +227,8 @@
 	24	0xbe045c80	0xbe045c80	0xbe050480	0xa800	DATA	.cached
 	25	0xbe051000	0xbe051000	0xbe0524c0	0x14c0	BSS	.bss
 
- module: input size 293521 (0x47a91) bytes 25 sections
- module: text 177701 (0x2b625) bytes
+ module: input size 294029 (0x47c8d) bytes 25 sections
+ module: text 178209 (0x2b821) bytes
     data 115820 (0x1c46c) bytes
     bss  5312 (0x14c0) bytes
 
@@ -251,7 +251,7 @@
  Entry point 0xbe008400
 
 	Totals	Start		End		Size
-	TEXT	0xbe008000	0xbe0337aa	0x2b7aa
+	TEXT	0xbe008000	0xbe0339a6	0x2b9a6
 	DATA	0xbe034000	0xbe050480	0x1c480
 	BSS	0xbe051000	0xbe0524c0	0x14c0
 
@@ -266,7 +266,7 @@
 	8	0xbe008300	0x3		0x9300	TEXT
 	9	0xbe008340	0x12		0x9340	TEXT
 	10	0xbe0083c0	0x6		0x93c0	TEXT
-	11	0xbe008400	0x2b3aa		0x9400	TEXT
+	11	0xbe008400	0x2b5a6		0x9400	TEXT
 	12	0xbe034000	0x10c28		0x35000	DATA
 	13	0xbe044c28	0x58		0x45c28	DATA
 	14	0xbe044c80	0x60		0x45c80	DATA

@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 5, 2022

Well maybe we have a lot of definitions of modifiable objects with static storage duration inside a functions that have a external linkage? It seems true to me as there are tons of macros defining some static structs in the function bodies?

That does not seem to be a problem, see test PRs #5213 and #5645

As far as I know there had been tons of effort invested into compilers so they can effectively make decisions on what is more effective - inlining a function or creating a new stack frame.

Yes, the inline keyword seems ignored by all optimizers I tried, even in old toolchains.

-fno-inline-functions looks "stronger".

However static inline is still useful for linkage reasons: I think it's the only clean way to share functions in .h header files (@andyross?). Whether these functions in .h files are actually inlined or is a totally different story. I mean optimization and linkage are ideally two different topics (that the C and C++ languages unfortunately mix up with each other).

That being said, I am not sure how much of this effort had been put into the "old standards" (which C99 certainly is).

AFAICT these complex C89/C99/C++ linkage rules affect only the non-static inline functions that no ever uses. I've only seen static inline, no one seems to ever use either the plain inline or extern inline.

I don't think optimizers ever care about linkage or standards. They hopefully don't have to.

We could probably try to approach it practically by simple performance comparisons. Let's just try to turn this off (-fno-inline-functions flag),

Yes, the only way to know for sure is to try and test. Afraid this looks like hard work though.

remove those ALWAYS_INLINE (replace then with stadard "inline" keyword hint)

ALWAYS_INLINE is only used in shared Zephyr code, not in SOF code.

taking the tools we need and excluding the tools we should not use.

Great summary of the main problem with "everything and the kitchen sink" C++. Many features are awesome but others undesirable in embedded. Where to draw the line considering super scarce code review bandwidth? Trust the developers! Just kidding of course :-)

Current C language standard is ISO/IEC 9899:2018. Xtensa toolchains support it for some time now (at least new ones).

"at least new ones".

Xtensa toolchains are co-generated when the chip is designed, that's the whole Xtensa "value proposition" and why they have funny names. Then it's a huge amount of work to upgrade the toolchain for older products. I think it's already been discussed but never happened AFAIK.
July 2023 EDIT: for instance it did NOT happen for #7114

In the real world: https://github.com/zephyrproject-rtos/zephyr/pulls?q=is%3Apr+xcc zephyrproject-rtos/zephyr#42972, 42939, 42494, 42017, 42012, ... @dcpleung any expert XCC opinion to share?

@dcpleung
Copy link
Contributor

dcpleung commented Apr 5, 2022

AFAIK, the GCC portion of XCC can only support C89 in terms of inlining (specially, the GNU89 inlining semantics), unless they are no longer based on GCC 4.2. IIRC, some of the inlining semantics are actually reversed between GNU89 and C99. So one has to be really careful there. Though, if we are going to use XCC-Clang, none of this would be an issue. :)

@lgirdwood
Copy link
Member

@marc-hb lets revisit this once we have fully aligned and stabilised with Zephyr. This interrupts the convergence flow atm and its a good thing to investigate once we have all native Zephyr APIs in use and stable.

@marc-hb marc-hb added the P3 Low-impact bugs or features label Apr 5, 2022
@marc-hb
Copy link
Collaborator Author

marc-hb commented Aug 24, 2022

cc: @andrula-song , @singalsu , @lyakh , @ShriramShastry

marc-hb added a commit to marc-hb/sof that referenced this issue Nov 22, 2022
Add reference to -fno-inline-functions issue
thesofproject#5212

Add examples of compilation failures without gnu99.

Fixes commit e430629 ("xt-clang: Do not use -std=gnu99 for C++")
that separated the comment from its code.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
lgirdwood pushed a commit that referenced this issue Nov 30, 2022
Add reference to -fno-inline-functions issue
#5212

Add examples of compilation failures without gnu99.

Fixes commit e430629 ("xt-clang: Do not use -std=gnu99 for C++")
that separated the comment from its code.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@kv2019i
Copy link
Collaborator

kv2019i commented Jul 14, 2023

This is popping up again in optimization work. The RI-2020.5 xcc help says:

-fno-inline-functions:  Do not inline static functions not marked as inline

So basicly this is ensuring inlining is explicitly controlled and not decided by the compiler, and very specific to XCC.

@lyakh
Copy link
Collaborator

lyakh commented Jul 17, 2023

-fno-inline-functions

@kv2019i are you sure it's xcc-specific? https://developer.arm.com/documentation/dui0774/e/Compiler-Command-line-Options/-fno-inline-functions and actually also in https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html :

-finline-functions

Consider all functions for inlining, even if they are not declared inline. The compiler heuristically decides which functions are worth integrating in this way.

So, logically, -fno-inline-functions doesn't consider functions for inlining, i.e. switches off automatic inlining, but doesn't change handling of explicitly inline functions

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 17, 2023

@kv2019i are you sure it's xcc-specific?

FWIW XCC's front-end = gcc. I don't remember seeing any XCC specific command line option.

Interesting post from Ian Lance Taylor shared by @lyakh somewhere else
https://gcc.gnu.org/legacy-ml/gcc/2006-11/msg00006.html

Handling of extern inline in c99 mode
From: Ian Lance Taylor
Cc: gcc at gcc dot gnu dot org
Date: 01 Nov 2006 08:02:55 -0800

@btian1
Copy link
Contributor

btian1 commented Jul 18, 2023

I post one reason to keep -fno-inline-functions as build option.

we can inline functions as we want to optimize performance, if let compiler to decide inline all possible functions to inline, then may be cause one function is extremely large and have potential issues and extra complex at asm level(like much more goto added).

So I would suggest keep it and close this issue.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 18, 2023

then may be cause one function is extremely large

How much? Last time I checked (a long time ago) it made very little difference.

have potential issues

What issues?

and extra complex at asm level(like much more goto added).

Optimized machine code stopped looking like its corresponding C source code a long time ago. It's not just inlining.
https://queue.acm.org/detail.cfm?id=3212479

@lyakh
Copy link
Collaborator

lyakh commented Jul 18, 2023

we can inline functions as we want to optimize performance,

I think in most cases the compiler is better at optimising such things for performance than us.

if let compiler to decide inline all possible functions to inline, then may be cause one function is extremely large and have potential issues and extra complex at asm level(like much more goto added).

That does indeed make it a bit more difficult - if you have your C code with several functions, but the compiler merges them all into one, debugging that at assembly level can indeed become more difficult. But IMHO that's a price we can pay for probable optimisation gains.

@btian1
Copy link
Contributor

btian1 commented Jul 18, 2023

we can only inline time critical related(like in process/copy), and for prepare and params, keep as separate does not have harm on performance.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 18, 2023

if you have your C code with several functions, but the compiler merges them all into one, debugging that at assembly level can indeed become more difficult

Relating optimized assembly with its corresponding source is indeed a lost cause. So that's not a consideration.

https://sourceware.org/gdb/onlinedocs/gdb/Optimized-Code.html
https://www.ibm.com/docs/en/xl-c-aix/13.1.2?topic=guide-debugging-optimized-code
https://queue.acm.org/detail.cfm?id=3212479

etc.

@btian1
Copy link
Contributor

btian1 commented Aug 3, 2023

image

I made a test and get above data, origin is left, and remove -fno-inline-functions is right, you can see, that after inline allowed, tgl sof txt size increase about 1k, and check further performance data, you can see only mixout improved a lot, this is because mixout_process inside have some inlined function.

so conclusion is: inline can improve performance when there is inline function in timing critical path, penality is: small txt size increase:0.2%.

@singalsu
Copy link
Collaborator

singalsu commented Aug 7, 2023

This explain why I've seen some small MCPS improvements from explicitly adding inline to some functions. But in code reviews some of those have got removed with assumption that compiler does the same work. If we keep this we should allow and use in code manual inline for obvious optimizations.

@lgirdwood
Copy link
Member

@btian1 I would submit a PR that removes the CC ""no-inline" option via Kconfig as some older versions fo XCC required this (and I would disable for all Intel platforms as IIUC it was needed around APL era).

@btian1
Copy link
Contributor

btian1 commented Aug 9, 2023

@lgirdwood , do you mean remove -fno-line-functions in intel tgl mtl and afterwards platforms, and keep it with old platforms for XCC build?

@lgirdwood
Copy link
Member

@lgirdwood , do you mean remove -fno-line-functions in intel tgl mtl and afterwards platforms, and keep it with old platforms for XCC build?

yes, this would allow other to validate thier CC version for each platform handles inline correctly.

@btian1
Copy link
Contributor

btian1 commented Aug 14, 2023

#8033

Please review as you expected or not, @lgirdwood

@btian1
Copy link
Contributor

btian1 commented Aug 18, 2023

closing this issue with #8033 merge.

@btian1 btian1 closed this as completed Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P3 Low-impact bugs or features
Projects
None yet
Development

No branches or pull requests

8 participants