feat: FIXMATH_NO_HARD_DIVISION option; update README; test-refactor #49

JPHutchins · 2023-08-08T05:11:12Z

Hello again!

This adds a "NO_HARD_DIVISION" option that is separate from the "OPTIMIZE_8BIT" option. The reason is that I am building for STM32G0 (Cortex-M0+) which does not have hardware division but does have 32-bit multiplication.

Perhaps you have some advice. From ARM:

"""
The MULS instruction provides a 32-bit x 32-bit multiply that returns the least-significant 32-bits of the result.
"""

https://developer.arm.com/documentation/ddi0484/c/Introduction/Configurable-options/Configurable-multiplier?lang=en

Presently I am compiling with FIXMATH_NO_HARD_DIVISION and FIXMATH_NO_OVERFLOW and primarily concerned with program size. fix16_mul ends up being 240B with these instructions.

0000031c <fix16_mul>:
 31c:   e92d0bf0        push    {r4, r5, r6, r7, r8, r9, fp}
 320:   e28db018        add     fp, sp, #24
 324:   e24dd01c        sub     sp, sp, #28
 328:   e50b0030        str     r0, [fp, #-48]  ; 0xffffffd0
 32c:   e50b1034        str     r1, [fp, #-52]  ; 0xffffffcc
 330:   e51b1030        ldr     r1, [fp, #-48]  ; 0xffffffd0
 334:   e1a00fc1        asr     r0, r1, #31
 338:   e1a06001        mov     r6, r1
 33c:   e1a07000        mov     r7, r0
 340:   e51b1034        ldr     r1, [fp, #-52]  ; 0xffffffcc
 344:   e1a00fc1        asr     r0, r1, #31
 348:   e1a04001        mov     r4, r1
 34c:   e1a05000        mov     r5, r0
 350:   e0000794        mul     r0, r4, r7
 354:   e0010596        mul     r1, r6, r5
 358:   e0801001        add     r1, r0, r1
 35c:   e0802496        umull   r2, r0, r6, r4
 360:   e1a03000        mov     r3, r0
 364:   e0811003        add     r1, r1, r3
 368:   e1a03001        mov     r3, r1
 36c:   e50b2024        str     r2, [fp, #-36]  ; 0xffffffdc
 370:   e50b3020        str     r3, [fp, #-32]  ; 0xffffffe0
 374:   e50b2024        str     r2, [fp, #-36]  ; 0xffffffdc
 378:   e50b3020        str     r3, [fp, #-32]  ; 0xffffffe0
 37c:   e24b3024        sub     r3, fp, #36     ; 0x24
 380:   e893000c        ldm     r3, {r2, r3}
 384:   e3530000        cmp     r3, #0
 388:   aa000005        bge     3a4 <fix16_mul+0x88>
 38c:   e24b3024        sub     r3, fp, #36     ; 0x24
 390:   e893000c        ldm     r3, {r2, r3}
 394:   e2528001        subs    r8, r2, #1
 398:   e2c39000        sbc     r9, r3, #0
 39c:   e50b8024        str     r8, [fp, #-36]  ; 0xffffffdc
 3a0:   e50b9020        str     r9, [fp, #-32]  ; 0xffffffe0
 3a4:   e24b1024        sub     r1, fp, #36     ; 0x24
 3a8:   e8910003        ldm     r1, {r0, r1}
 3ac:   e3a02000        mov     r2, #0
 3b0:   e3a03000        mov     r3, #0
 3b4:   e1a02820        lsr     r2, r0, #16
 3b8:   e1822801        orr     r2, r2, r1, lsl #16
 3bc:   e1a03841        asr     r3, r1, #16
 3c0:   e1a03002        mov     r3, r2
 3c4:   e50b3028        str     r3, [fp, #-40]  ; 0xffffffd8
 3c8:   e24b1024        sub     r1, fp, #36     ; 0x24
 3cc:   e8910003        ldm     r1, {r0, r1}
 3d0:   e3a02000        mov     r2, #0
 3d4:   e3a03000        mov     r3, #0
 3d8:   e1a027a0        lsr     r2, r0, #15
 3dc:   e1822881        orr     r2, r2, r1, lsl #17
 3e0:   e1a037c1        asr     r3, r1, #15
 3e4:   e1a03002        mov     r3, r2
 3e8:   e2032001        and     r2, r3, #1
 3ec:   e51b3028        ldr     r3, [fp, #-40]  ; 0xffffffd8
 3f0:   e0823003        add     r3, r2, r3
 3f4:   e50b3028        str     r3, [fp, #-40]  ; 0xffffffd8
 3f8:   e51b3028        ldr     r3, [fp, #-40]  ; 0xffffffd8
 3fc:   e1a00003        mov     r0, r3
 400:   e24bd018        sub     sp, fp, #24
 404:   e8bd0bf0        pop     {r4, r5, r6, r7, r8, r9, fp}
 408:   e12fff1e        bx      lr

Yet I see the umull instruction in there which is 32 x 32 -> 64.

Test generation is refactored slightly to remove repetition.

Cheers,
J.P.

PetteriAimonen · 2023-08-08T06:08:49Z

Somehow the assembler listing looks quite weird, are you sure you have compiler optimizations on? E.g. -Os is usually good. For example line 3f4 vs. 3f8 seems to store and load the same data right after each other, which makes me think this is compiled with -O0 (no optimizations).

JPHutchins · 2023-08-08T18:04:35Z

Somehow the assembler listing looks quite weird, are you sure you have compiler optimizations on? E.g. -Os is usually good. For example line 3f4 vs. 3f8 seems to store and load the same data right after each other, which makes me think this is compiled with -O0 (no optimizations).

Of course! It's down to 40B now 🎉

I must have expected the imported project to inherit from the parent CMake project's compile options. Of course, that would be bad if it was implicit, but I also can't figure out how to make it explicit! For future reference, this stuff is fairly common:

target_compile_options(libfixmath PRIVATE
    -ffunction-sections
    -fdata-sections
    -Os
)

feat: FIXMATH_NO_HARD_DIVISION option; update README; test-refactor

4475ffb

PetteriAimonen merged commit d308e46 into PetteriAimonen:master Aug 8, 2023
1 check passed

JPHutchins deleted the feature/hardware-division-option branch August 8, 2023 18:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: FIXMATH_NO_HARD_DIVISION option; update README; test-refactor #49

feat: FIXMATH_NO_HARD_DIVISION option; update README; test-refactor #49

JPHutchins commented Aug 8, 2023 •

edited

Loading

PetteriAimonen commented Aug 8, 2023

JPHutchins commented Aug 8, 2023

feat: FIXMATH_NO_HARD_DIVISION option; update README; test-refactor #49

feat: FIXMATH_NO_HARD_DIVISION option; update README; test-refactor #49

Conversation

JPHutchins commented Aug 8, 2023 • edited Loading

PetteriAimonen commented Aug 8, 2023

JPHutchins commented Aug 8, 2023

JPHutchins commented Aug 8, 2023 •

edited

Loading