Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/pm_layered: align pm_blocker_t for speed #18846

Merged
merged 3 commits into from
Nov 8, 2022

Conversation

kfessel
Copy link
Contributor

@kfessel kfessel commented Nov 4, 2022

pm_(un)block add attribute optimize(3) - shortens hotpath by moving the jump to panic behind return

pm_get_blocker = instead of memcpy to ease readability

Contribution description

this PR aligns pm_blocker_t that enables the compiler to use word loads instead of byte loads with
cortexm0: whole struct access (pm_get_blocker and pm_set_lowest used memcpy or bytewise access to read pm_blocker) with this it is ldr (word) and some register operations

Testing procedure

read

Issues/PRs references

#17607 started the journey
#18821 opened the rabbit hole
#18842

@kfessel kfessel requested a review from kaspar030 as a code owner November 4, 2022 14:12
@github-actions github-actions bot added the Area: sys Area: System label Nov 4, 2022
@kfessel kfessel added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed Area: sys Area: System labels Nov 4, 2022
@riot-ci
Copy link

riot-ci commented Nov 4, 2022

Murdock results

✔️ PASSED

72a1e93 sys/pm_layered: pm_get_blocker = instead of memcopy -ease readability

Success Failures Total Runtime
2000 0 2000 06m:47s

Artifacts

This only reflects a subset of all builds from https://ci-prod.riot-os.org. Please refer to https://ci.riot-os.org for a complete build for now.

@kfessel kfessel requested a review from benpicco November 4, 2022 14:27
@benpicco benpicco requested a review from jue89 November 4, 2022 14:30
@benpicco
Copy link
Contributor

benpicco commented Nov 4, 2022

When you say speed can you put any numbers on this?

@kfessel
Copy link
Contributor Author

kfessel commented Nov 4, 2022

i got no numbers but read a huge amount of assembly for the stm32-f767 and samr21

@kfessel
Copy link
Contributor Author

kfessel commented Nov 4, 2022

nucleo-f764zi master:


Disassembly of section .text.pm_set_lowest:

00000000 <pm_set_lowest>:
   0:	b510      	push	{r4, lr}
   2:	f3ef 8410 	mrs	r4, PRIMASK
   6:	b672      	cpsid	i
   8:	4b07      	ldr	r3, [pc, #28]	; (28 <pm_set_lowest+0x28>)
   a:	789a      	ldrb	r2, [r3, #2]
   c:	b93a      	cbnz	r2, 1e <pm_set_lowest+0x1e>
   e:	785a      	ldrb	r2, [r3, #1]
  10:	b942      	cbnz	r2, 24 <pm_set_lowest+0x24>
  12:	7818      	ldrb	r0, [r3, #0]
  14:	3800      	subs	r0, #0
  16:	bf18      	it	ne
  18:	2001      	movne	r0, #1
  1a:	f7ff fffe 	bl	0 <pm_set>
  1e:	f384 8810 	msr	PRIMASK, r4
  22:	bd10      	pop	{r4, pc}
  24:	2002      	movs	r0, #2
  26:	e7f8      	b.n	1a <pm_set_lowest+0x1a>
  28:	00000000 	.word	0x00000000

Disassembly of section .text.pm_block:

00000000 <pm_block>:
   0:	b508      	push	{r3, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a05      	ldr	r2, [pc, #20]	; (20 <pm_block+0x20>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	2bff      	cmp	r3, #255	; 0xff
   e:	d101      	bne.n	14 <pm_block+0x14>
  10:	f7ff fffe 	bl	0 <_assert_panic>
  14:	3301      	adds	r3, #1
  16:	5413      	strb	r3, [r2, r0]
  18:	f381 8810 	msr	PRIMASK, r1
  1c:	bd08      	pop	{r3, pc}
  1e:	bf00      	nop
  20:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	b508      	push	{r3, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a04      	ldr	r2, [pc, #16]	; (1c <pm_unblock+0x1c>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	b90b      	cbnz	r3, 12 <pm_unblock+0x12>
   e:	f7ff fffe 	bl	0 <_assert_panic>
  12:	3b01      	subs	r3, #1
  14:	5413      	strb	r3, [r2, r0]
  16:	f381 8810 	msr	PRIMASK, r1
  1a:	bd08      	pop	{r3, pc}
  1c:	00000000 	.word	0x00000000

Disassembly of section .text.pm_get_blocker:

00000000 <pm_get_blocker>:
   0:	b082      	sub	sp, #8
   2:	f3ef 8210 	mrs	r2, PRIMASK
   6:	b672      	cpsid	i
   8:	4b0b      	ldr	r3, [pc, #44]	; (38 <pm_get_blocker+0x38>)
   a:	8819      	ldrh	r1, [r3, #0]
   c:	789b      	ldrb	r3, [r3, #2]
   e:	f8ad 1000 	strh.w	r1, [sp]
  12:	f88d 3002 	strb.w	r3, [sp, #2]
  16:	f382 8810 	msr	PRIMASK, r2
  1a:	9b00      	ldr	r3, [sp, #0]
  1c:	2000      	movs	r0, #0
  1e:	b2da      	uxtb	r2, r3
  20:	f362 0007 	bfi	r0, r2, #0, #8
  24:	f3c3 2207 	ubfx	r2, r3, #8, #8
  28:	f3c3 4307 	ubfx	r3, r3, #16, #8
  2c:	f362 200f 	bfi	r0, r2, #8, #8
  30:	f363 4017 	bfi	r0, r3, #16, #8
  34:	b002      	add	sp, #8
  36:	4770      	bx	lr
  38:	00000000 	.word	0x00000000

this PR

Disassembly of section .text.pm_set_lowest:

00000000 <pm_set_lowest>:
   0:	b510      	push	{r4, lr}
   2:	f3ef 8410 	mrs	r4, PRIMASK
   6:	b672      	cpsid	i
   8:	4b07      	ldr	r3, [pc, #28]	; (28 <pm_set_lowest+0x28>)
   a:	789a      	ldrb	r2, [r3, #2]
   c:	b93a      	cbnz	r2, 1e <pm_set_lowest+0x1e>
   e:	785a      	ldrb	r2, [r3, #1]
  10:	b942      	cbnz	r2, 24 <pm_set_lowest+0x24>
  12:	7818      	ldrb	r0, [r3, #0]
  14:	3800      	subs	r0, #0
  16:	bf18      	it	ne
  18:	2001      	movne	r0, #1
  1a:	f7ff fffe 	bl	0 <pm_set>
  1e:	f384 8810 	msr	PRIMASK, r4
  22:	bd10      	pop	{r4, pc}
  24:	2002      	movs	r0, #2
  26:	e7f8      	b.n	1a <pm_set_lowest+0x1a>
  28:	00000000 	.word	0x00000000

Disassembly of section .text.pm_block:

00000000 <pm_block>:
   0:	b508      	push	{r3, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a05      	ldr	r2, [pc, #20]	; (20 <pm_block+0x20>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	2bff      	cmp	r3, #255	; 0xff
   e:	d004      	beq.n	1a <pm_block+0x1a>
  10:	3301      	adds	r3, #1
  12:	5413      	strb	r3, [r2, r0]
  14:	f381 8810 	msr	PRIMASK, r1
  18:	bd08      	pop	{r3, pc}
  1a:	f7ff fffe 	bl	0 <_assert_panic>
  1e:	bf00      	nop
  20:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	b508      	push	{r3, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a04      	ldr	r2, [pc, #16]	; (1c <pm_unblock+0x1c>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	b123      	cbz	r3, 18 <pm_unblock+0x18>
   e:	3b01      	subs	r3, #1
  10:	5413      	strb	r3, [r2, r0]
  12:	f381 8810 	msr	PRIMASK, r1
  16:	bd08      	pop	{r3, pc}
  18:	f7ff fffe 	bl	0 <_assert_panic>
  1c:	00000000 	.word	0x00000000

Disassembly of section .text.pm_get_blocker:

00000000 <pm_get_blocker>:
   0:	b082      	sub	sp, #8
   2:	f3ef 8310 	mrs	r3, PRIMASK
   6:	b672      	cpsid	i
   8:	4a03      	ldr	r2, [pc, #12]	; (18 <pm_get_blocker+0x18>)
   a:	6812      	ldr	r2, [r2, #0]
   c:	9200      	str	r2, [sp, #0]
   e:	f383 8810 	msr	PRIMASK, r3
  12:	9800      	ldr	r0, [sp, #0]
  14:	b002      	add	sp, #8
  16:	4770      	bx	lr
  18:	00000000 	.word	0x00000000

samr21 ante PR

00000000 <pm_set_lowest>:
   0:	b510      	push	{r4, lr}
   2:	f3ef 8410 	mrs	r4, PRIMASK
   6:	b672      	cpsid	i
   8:	2304      	movs	r3, #4
   a:	4a08      	ldr	r2, [pc, #32]	; (2c <pm_set_lowest+0x2c>)
   c:	0018      	movs	r0, r3
   e:	3b01      	subs	r3, #1
  10:	5c99      	ldrb	r1, [r3, r2]
  12:	2900      	cmp	r1, #0
  14:	d105      	bne.n	22 <pm_set_lowest+0x22>
  16:	2b00      	cmp	r3, #0
  18:	d1f8      	bne.n	c <pm_set_lowest+0xc>
  1a:	0018      	movs	r0, r3
  1c:	f7ff fffe 	bl	0 <pm_set>
  20:	e001      	b.n	26 <pm_set_lowest+0x26>
  22:	2804      	cmp	r0, #4
  24:	d1fa      	bne.n	1c <pm_set_lowest+0x1c>
  26:	f384 8810 	msr	PRIMASK, r4
  2a:	bd10      	pop	{r4, pc}
  2c:	00000000 	.word	0x00000000

Disassembly of section .text.pm_block:

00000000 <pm_block>:
   0:	b510      	push	{r4, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a05      	ldr	r2, [pc, #20]	; (20 <pm_block+0x20>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	2bff      	cmp	r3, #255	; 0xff
   e:	d101      	bne.n	14 <pm_block+0x14>
  10:	f7ff fffe 	bl	0 <_assert_panic>
  14:	3301      	adds	r3, #1
  16:	5413      	strb	r3, [r2, r0]
  18:	f381 8810 	msr	PRIMASK, r1
  1c:	bd10      	pop	{r4, pc}
  1e:	46c0      	nop			; (mov r8, r8)
  20:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	b510      	push	{r4, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a05      	ldr	r2, [pc, #20]	; (20 <pm_unblock+0x20>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	2b00      	cmp	r3, #0
   e:	d101      	bne.n	14 <pm_unblock+0x14>
  10:	f7ff fffe 	bl	0 <_assert_panic>
  14:	3b01      	subs	r3, #1
  16:	5413      	strb	r3, [r2, r0]
  18:	f381 8810 	msr	PRIMASK, r1
  1c:	bd10      	pop	{r4, pc}
  1e:	46c0      	nop			; (mov r8, r8)
  20:	00000000 	.word	0x00000000

Disassembly of section .text.pm_get_blocker:

00000000 <pm_get_blocker>:
   0:	b513      	push	{r0, r1, r4, lr}
   2:	f3ef 8410 	mrs	r4, PRIMASK
   6:	b672      	cpsid	i
   8:	2204      	movs	r2, #4
   a:	4668      	mov	r0, sp
   c:	490a      	ldr	r1, [pc, #40]	; (38 <pm_get_blocker+0x38>)
   e:	f7ff fffe 	bl	0 <memcpy>
  12:	f384 8810 	msr	PRIMASK, r4
  16:	9b00      	ldr	r3, [sp, #0]
  18:	24ff      	movs	r4, #255	; 0xff
  1a:	0018      	movs	r0, r3
  1c:	0a19      	lsrs	r1, r3, #8
  1e:	0c1a      	lsrs	r2, r3, #16
  20:	4021      	ands	r1, r4
  22:	4020      	ands	r0, r4
  24:	0209      	lsls	r1, r1, #8
  26:	4022      	ands	r2, r4
  28:	0412      	lsls	r2, r2, #16
  2a:	4308      	orrs	r0, r1
  2c:	0e1b      	lsrs	r3, r3, #24
  2e:	061b      	lsls	r3, r3, #24
  30:	4310      	orrs	r0, r2
  32:	4318      	orrs	r0, r3
  34:	bd16      	pop	{r1, r2, r4, pc}
  36:	46c0      	nop			; (mov r8, r8)
  38:	00000000 	.word	0x00000000

post PR:

Disassembly of section .text.pm_set_lowest:

00000000 <pm_set_lowest>:
   0:	b510      	push	{r4, lr}
   2:	f3ef 8410 	mrs	r4, PRIMASK
   6:	b672      	cpsid	i
   8:	2304      	movs	r3, #4
   a:	4a08      	ldr	r2, [pc, #32]	; (2c <pm_set_lowest+0x2c>)
   c:	0018      	movs	r0, r3
   e:	3b01      	subs	r3, #1
  10:	5c99      	ldrb	r1, [r3, r2]
  12:	2900      	cmp	r1, #0
  14:	d105      	bne.n	22 <pm_set_lowest+0x22>
  16:	2b00      	cmp	r3, #0
  18:	d1f8      	bne.n	c <pm_set_lowest+0xc>
  1a:	0018      	movs	r0, r3
  1c:	f7ff fffe 	bl	0 <pm_set>
  20:	e001      	b.n	26 <pm_set_lowest+0x26>
  22:	2804      	cmp	r0, #4
  24:	d1fa      	bne.n	1c <pm_set_lowest+0x1c>
  26:	f384 8810 	msr	PRIMASK, r4
  2a:	bd10      	pop	{r4, pc}
  2c:	00000000 	.word	0x00000000

Disassembly of section .text.pm_block:

00000000 <pm_block>:
   0:	b510      	push	{r4, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a05      	ldr	r2, [pc, #20]	; (20 <pm_block+0x20>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	2bff      	cmp	r3, #255	; 0xff
   e:	d004      	beq.n	1a <pm_block+0x1a>
  10:	3301      	adds	r3, #1
  12:	5413      	strb	r3, [r2, r0]
  14:	f381 8810 	msr	PRIMASK, r1
  18:	bd10      	pop	{r4, pc}
  1a:	f7ff fffe 	bl	0 <_assert_panic>
  1e:	46c0      	nop			; (mov r8, r8)
  20:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	b510      	push	{r4, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a05      	ldr	r2, [pc, #20]	; (20 <pm_unblock+0x20>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	2b00      	cmp	r3, #0
   e:	d004      	beq.n	1a <pm_unblock+0x1a>
  10:	3b01      	subs	r3, #1
  12:	5413      	strb	r3, [r2, r0]
  14:	f381 8810 	msr	PRIMASK, r1
  18:	bd10      	pop	{r4, pc}
  1a:	f7ff fffe 	bl	0 <_assert_panic>
  1e:	46c0      	nop			; (mov r8, r8)
  20:	00000000 	.word	0x00000000

Disassembly of section .text.pm_get_blocker:

00000000 <pm_get_blocker>:
   0:	f3ef 8310 	mrs	r3, PRIMASK
   4:	b672      	cpsid	i
   6:	4a02      	ldr	r2, [pc, #8]	; (10 <pm_get_blocker+0x10>)
   8:	6810      	ldr	r0, [r2, #0]
   a:	f383 8810 	msr	PRIMASK, r3
   e:	4770      	bx	lr
  10:	00000000 	.word	0x00000000

@jue89
Copy link
Contributor

jue89 commented Nov 4, 2022

I'm a huge fan of switching GPIOs to compare before and after with an oscilloscope. On most platforms it's on the same clock domain as the CPU and should introduce a fixed latency. It should allow to show improvements.

@benpicco
Copy link
Contributor

benpicco commented Nov 4, 2022

I'm a huge fan of switching GPIOs to compare before and after with an oscilloscope. On most platforms it's on the same clock domain as the CPU and should introduce a fixed latency. It should allow to show improvements.

Not sure if this is really necessary if we already see a reduction on code generated.

@kfessel kfessel force-pushed the p-pm-layerd-speedup1 branch from 92408cf to a036042 Compare November 4, 2022 15:52
@github-actions github-actions bot added the Area: sys Area: System label Nov 4, 2022
@kfessel kfessel force-pushed the p-pm-layerd-speedup1 branch from a036042 to 72a1e93 Compare November 4, 2022 16:00
@jue89
Copy link
Contributor

jue89 commented Nov 4, 2022

I changed the pm shell command like below and ran the tests/periph_pm application on the samr30-xpro:

diff --git a/sys/shell/cmds/pm.c b/sys/shell/cmds/pm.c
index 1074bad99e..7178e1f300 100644
--- a/sys/shell/cmds/pm.c
+++ b/sys/shell/cmds/pm.c
@@ -26,6 +26,7 @@
 
 #include "periph/pm.h"
 #include "shell.h"
+#include "board.h"
 
 #ifdef MODULE_PM_LAYERED
 #include "pm_layered.h"
@@ -76,7 +77,9 @@ static int cmd_block(char *arg)
     printf("Blocking power mode %d.\n", mode);
     fflush(stdout);
 
+    LED0_OFF;
     pm_block(mode);
+    LED0_ON;
 
     return 0;
 }
@@ -117,7 +120,9 @@ static int cmd_unblock(char *arg)
     printf("Unblocking power mode %d.\n", mode);
     fflush(stdout);
 
+    LED1_OFF;
     pm_unblock(mode);
+    LED1_ON;
 
     return 0;
 }

On master (ccbb304) I get:

  • pm_block: 1.16us
  • pm_unblock: 1.00us to 1.16us

With this PR (72a1e93) I get:

  • pm_block: 1.00us to 1.16us
  • pm_unblock: 1.00us to 1.16us

Am I holding it wrong?

But I wouldn't block if other CPUs benefit from this patch!

@kfessel
Copy link
Contributor Author

kfessel commented Nov 4, 2022

@jue89

Am I holding it wrong?

no your tests are right the improvements for block and unblock are low (they will be non if assert is removed and might be larger when the more verbose assert is used), I would expect 1-2 cycles less spend in block and unblock for a cortexm0 cpu since the fetch of the not taken jump to assert_panic is not in the direct path of the program counter (moved from 0x10 or 0x0e to the end of the function) this is not due to alignment but by the 'attribute((optimize(3)))'. the alignment will not benefit block and unblock since they are always byte access.

m0:
block runs from 0 to 0x1c before and 0 to 0x18 after PR
unblock runs from 0 to 0x1c before and 0 to 0x18 after PR

m7
block runs from 0 to 0x1c before and 0 to 0x18 after PR
unblock runs from 0 to 0x1a before and 0 to 0x16 after PR

the improvements of the alignment are very obvious in the pm_get_blocker (where for m0 memcpy call was replace by a copy word, and on the m7 the 4*ldrb by 1 ldr)

for pm_set_lowest gain may be in the alignment (pm_blocker might have been missaligned before -> ldr (m0) needs two memory accesses ldrb (m7) might have both parts of pm_blocker in different cache-lines or they allign (in that case ther will be no gain by aligning).
With this pr they align the one ldr is one memory access.
(pm_get_blocker shows how the reads might have been split before and after this pr they aren't)

and at last these gains depend on the memory speed there is some place where microchip states memory access take 1 bus cycle but i dont know which bus they talk about

@kfessel
Copy link
Contributor Author

kfessel commented Nov 4, 2022

@jue89:
i just thought there might be another gain to have by switching to assert (since i saw not stack being build for block and unblock using atomic access) but it turned out that the stack wasn't build because i didn't assert in the atomic variants
atomic versions with assert also build a stack.

__attribute__((optimize(3)))
void pm_block(unsigned mode)
{
    DEBUG("[pm_layered] pm_block(%d)\n", mode);
#if 1
    assert(atomic_fetch_add(&pm_blocker.blockers[mode],1)<255);
#elif 1
    atomic_fetch_add(&pm_blocker.blockers[mode],1);
#else
    unsigned state = irq_disable();
    assert(pm_blocker.blockers[mode] != 255);
    pm_blocker.blockers[mode]++;
    irq_restore(state);
#endif
}

__attribute__((optimize(3)))
void pm_unblock(unsigned mode)
{
    DEBUG("[pm_layered] pm_unblock(%d)\n", mode);
#if 1
    assert(atomic_fetch_sub(&pm_blocker.blockers[mode],1)>0);
#elif 1
    atomic_fetch_sub(&pm_blocker.blockers[mode],1);
#else 
    unsigned state = irq_disable();
    assert(pm_blocker.blockers[mode] > 0);
    pm_blocker.blockers[mode]--;
    irq_restore(state);
#endif
}

Somehow the <stdatomic.h> atomic_fetch_add is a generic (applys to different datatypes) even in C -- some buildin magic or crazy macros I guess

for the test i just used the same struct (did not change the type of .blocker[]) and it compiled to the same code than with changed types (might not be working for all architectures)

stm32 f767 atomic without assert:

00000000 <pm_block>:
   0:	4b06      	ldr	r3, [pc, #24]	; (1c <pm_block+0x1c>)
   2:	f3bf 8f5b 	dmb	ish
   6:	4418      	add	r0, r3
   8:	e8d0 3f4f 	ldrexb	r3, [r0]
   c:	3301      	adds	r3, #1
   e:	e8c0 3f42 	strexb	r2, r3, [r0]
  12:	2a00      	cmp	r2, #0
  14:	d1f8      	bne.n	8 <pm_block+0x8>
  16:	f3bf 8f5b 	dmb	ish
  1a:	4770      	bx	lr
  1c:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	4b06      	ldr	r3, [pc, #24]	; (1c <pm_unblock+0x1c>)
   2:	f3bf 8f5b 	dmb	ish
   6:	4418      	add	r0, r3
   8:	e8d0 3f4f 	ldrexb	r3, [r0]
   c:	3301      	adds	r3, #1
   e:	e8c0 3f42 	strexb	r2, r3, [r0]
  12:	2a00      	cmp	r2, #0
  14:	d1f8      	bne.n	8 <pm_unblock+0x8>
  16:	f3bf 8f5b 	dmb	ish
  1a:	4770      	bx	lr
  1c:	00000000 	.word	0x00000000

atomic with assert:

00000000 <pm_block>:
   0:	b508      	push	{r3, lr}
   2:	f3bf 8f5b 	dmb	ish
   6:	4b08      	ldr	r3, [pc, #32]	; (28 <pm_block+0x28>)
   8:	4418      	add	r0, r3
   a:	e8d0 3f4f 	ldrexb	r3, [r0]
   e:	1c5a      	adds	r2, r3, #1
  10:	e8c0 2f41 	strexb	r1, r2, [r0]
  14:	2900      	cmp	r1, #0
  16:	d1f8      	bne.n	a <pm_block+0xa>
  18:	b2db      	uxtb	r3, r3
  1a:	f3bf 8f5b 	dmb	ish
  1e:	2bff      	cmp	r3, #255	; 0xff
  20:	d000      	beq.n	24 <pm_block+0x24>
  22:	bd08      	pop	{r3, pc}
  24:	f7ff fffe 	bl	0 <_assert_panic>
  28:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	b508      	push	{r3, lr}
   2:	f3bf 8f5b 	dmb	ish
   6:	4b08      	ldr	r3, [pc, #32]	; (28 <pm_unblock+0x28>)
   8:	4418      	add	r0, r3
   a:	e8d0 3f4f 	ldrexb	r3, [r0]
   e:	1c5a      	adds	r2, r3, #1
  10:	e8c0 2f41 	strexb	r1, r2, [r0]
  14:	2900      	cmp	r1, #0
  16:	d1f8      	bne.n	a <pm_unblock+0xa>
  18:	b2db      	uxtb	r3, r3
  1a:	f3bf 8f5b 	dmb	ish
  1e:	2bff      	cmp	r3, #255	; 0xff
  20:	d000      	beq.n	24 <pm_unblock+0x24>
  22:	bd08      	pop	{r3, pc}
  24:	f7ff fffe 	bl	0 <_assert_panic>
  28:	00000000 	.word	0x00000000

so counting lines atomic should be slower - return in 0x22 vs 0x18 for irq_disable

for the samr21 atomic looks like this:

00000000 <pm_block>:
   0:	4b05      	ldr	r3, [pc, #20]	; (18 <pm_block+0x18>)
   2:	b510      	push	{r4, lr}
   4:	2205      	movs	r2, #5
   6:	2101      	movs	r1, #1
   8:	1818      	adds	r0, r3, r0
   a:	f7ff fffe 	bl	0 <__atomic_fetch_add_1>
   e:	28ff      	cmp	r0, #255	; 0xff
  10:	d000      	beq.n	14 <pm_block+0x14>
  12:	bd10      	pop	{r4, pc}
  14:	f7ff fffe 	bl	0 <_assert_panic>
  18:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	4b05      	ldr	r3, [pc, #20]	; (18 <pm_unblock+0x18>)
   2:	b510      	push	{r4, lr}
   4:	2205      	movs	r2, #5
   6:	2101      	movs	r1, #1
   8:	1818      	adds	r0, r3, r0
   a:	f7ff fffe 	bl	0 <__atomic_fetch_sub_1>
   e:	2800      	cmp	r0, #0
  10:	d000      	beq.n	14 <pm_unblock+0x14>
  12:	bd10      	pop	{r4, pc}
  14:	f7ff fffe 	bl	0 <_assert_panic>
  18:	00000000 	.word	0x00000000

so there is no atomic support in the cpu but workaround function that dis- and enable irq

@kfessel kfessel merged commit c354ab6 into RIOT-OS:master Nov 8, 2022
bors bot added a commit that referenced this pull request Jan 16, 2023
18477: gnrc_static: add static network configuration r=miri64 a=benpicco



19155: Revert "sys/pm_layered: pm_(un)block add attribute optimize(3)" r=benpicco a=Teufelchen1

Revert "sys/pm_layered: pm_(un)block add attribute optimize(3) -shortens hotpath"

This reverts commit 5447203.

### Contribution description

Compiling `examples/gnrc_networking_mac` using `TOOLCHAIN=llvm` yields the following error:
```
RIOT/sys/pm_layered/pm.c:77:16: error: unknown attribute 'optimize' ignored [-Werror,-Wunknown-attributes]
__attribute__((optimize(3)))
```
As indicated, this is because the attribute `optimize` is GCC only and not present in LLVM.
Compare the manpages of [GCC](https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html) and [LLVM](https://clang.llvm.org/docs/AttributeReference.html).


### Testing procedure

Since this should only affect performance and not behavior, no special testing is needed. I am not aware of any tests in RIOT which could verify that assumption.

### Issues/PRs references

Introduced in #18846

There is another instance of this attribute being used in[ shell_lock.c](https://github.com/RIOT-OS/RIOT/blob/6fb340d654ac8da07759cb9199c6aaa478589aa7/sys/shell_lock/shell_lock.c#L80). Since the usage is security related, I omit it from this PR.


Co-authored-by: Benjamin Valentin <benjamin.valentin@ml-pa.com>
Co-authored-by: Teufelchen1 <bennet.blischke@haw-hamburg.de>
bors bot added a commit that referenced this pull request Jan 16, 2023
18477: gnrc_static: add static network configuration r=miri64 a=benpicco



19155: Revert "sys/pm_layered: pm_(un)block add attribute optimize(3)" r=maribu a=Teufelchen1

Revert "sys/pm_layered: pm_(un)block add attribute optimize(3) -shortens hotpath"

This reverts commit 5447203.

### Contribution description

Compiling `examples/gnrc_networking_mac` using `TOOLCHAIN=llvm` yields the following error:
```
RIOT/sys/pm_layered/pm.c:77:16: error: unknown attribute 'optimize' ignored [-Werror,-Wunknown-attributes]
__attribute__((optimize(3)))
```
As indicated, this is because the attribute `optimize` is GCC only and not present in LLVM.
Compare the manpages of [GCC](https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html) and [LLVM](https://clang.llvm.org/docs/AttributeReference.html).


### Testing procedure

Since this should only affect performance and not behavior, no special testing is needed. I am not aware of any tests in RIOT which could verify that assumption.

### Issues/PRs references

Introduced in #18846

There is another instance of this attribute being used in[ shell_lock.c](https://github.com/RIOT-OS/RIOT/blob/6fb340d654ac8da07759cb9199c6aaa478589aa7/sys/shell_lock/shell_lock.c#L80). Since the usage is security related, I omit it from this PR.


Co-authored-by: Benjamin Valentin <benjamin.valentin@ml-pa.com>
Co-authored-by: Teufelchen1 <bennet.blischke@haw-hamburg.de>
bors bot added a commit that referenced this pull request Jan 16, 2023
18477: gnrc_static: add static network configuration r=miri64 a=benpicco



19101: CI: update check-labels-action r=miri64 a=kaspar030



19155: Revert "sys/pm_layered: pm_(un)block add attribute optimize(3)" r=maribu a=Teufelchen1

Revert "sys/pm_layered: pm_(un)block add attribute optimize(3) -shortens hotpath"

This reverts commit 5447203.

### Contribution description

Compiling `examples/gnrc_networking_mac` using `TOOLCHAIN=llvm` yields the following error:
```
RIOT/sys/pm_layered/pm.c:77:16: error: unknown attribute 'optimize' ignored [-Werror,-Wunknown-attributes]
__attribute__((optimize(3)))
```
As indicated, this is because the attribute `optimize` is GCC only and not present in LLVM.
Compare the manpages of [GCC](https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html) and [LLVM](https://clang.llvm.org/docs/AttributeReference.html).


### Testing procedure

Since this should only affect performance and not behavior, no special testing is needed. I am not aware of any tests in RIOT which could verify that assumption.

### Issues/PRs references

Introduced in #18846

There is another instance of this attribute being used in[ shell_lock.c](https://github.com/RIOT-OS/RIOT/blob/6fb340d654ac8da07759cb9199c6aaa478589aa7/sys/shell_lock/shell_lock.c#L80). Since the usage is security related, I omit it from this PR.


Co-authored-by: Benjamin Valentin <benjamin.valentin@ml-pa.com>
Co-authored-by: Kaspar Schleiser <kaspar@schleiser.de>
Co-authored-by: Teufelchen1 <bennet.blischke@haw-hamburg.de>
@kaspar030 kaspar030 added this to the Release 2023.01 milestone Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: sys Area: System CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants