Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: FIXMATH_NO_HARD_DIVISION option; update README; test-refactor #49

Conversation

JPHutchins
Copy link
Contributor

@JPHutchins JPHutchins commented Aug 8, 2023

Hello again!

This adds a "NO_HARD_DIVISION" option that is separate from the "OPTIMIZE_8BIT" option. The reason is that I am building for STM32G0 (Cortex-M0+) which does not have hardware division but does have 32-bit multiplication.

Perhaps you have some advice. From ARM:

"""
The MULS instruction provides a 32-bit x 32-bit multiply that returns the least-significant 32-bits of the result.
"""

https://developer.arm.com/documentation/ddi0484/c/Introduction/Configurable-options/Configurable-multiplier?lang=en

Presently I am compiling with FIXMATH_NO_HARD_DIVISION and FIXMATH_NO_OVERFLOW and primarily concerned with program size. fix16_mul ends up being 240B with these instructions.

0000031c <fix16_mul>:
 31c:   e92d0bf0        push    {r4, r5, r6, r7, r8, r9, fp}
 320:   e28db018        add     fp, sp, #24
 324:   e24dd01c        sub     sp, sp, #28
 328:   e50b0030        str     r0, [fp, #-48]  ; 0xffffffd0
 32c:   e50b1034        str     r1, [fp, #-52]  ; 0xffffffcc
 330:   e51b1030        ldr     r1, [fp, #-48]  ; 0xffffffd0
 334:   e1a00fc1        asr     r0, r1, #31
 338:   e1a06001        mov     r6, r1
 33c:   e1a07000        mov     r7, r0
 340:   e51b1034        ldr     r1, [fp, #-52]  ; 0xffffffcc
 344:   e1a00fc1        asr     r0, r1, #31
 348:   e1a04001        mov     r4, r1
 34c:   e1a05000        mov     r5, r0
 350:   e0000794        mul     r0, r4, r7
 354:   e0010596        mul     r1, r6, r5
 358:   e0801001        add     r1, r0, r1
 35c:   e0802496        umull   r2, r0, r6, r4
 360:   e1a03000        mov     r3, r0
 364:   e0811003        add     r1, r1, r3
 368:   e1a03001        mov     r3, r1
 36c:   e50b2024        str     r2, [fp, #-36]  ; 0xffffffdc
 370:   e50b3020        str     r3, [fp, #-32]  ; 0xffffffe0
 374:   e50b2024        str     r2, [fp, #-36]  ; 0xffffffdc
 378:   e50b3020        str     r3, [fp, #-32]  ; 0xffffffe0
 37c:   e24b3024        sub     r3, fp, #36     ; 0x24
 380:   e893000c        ldm     r3, {r2, r3}
 384:   e3530000        cmp     r3, #0
 388:   aa000005        bge     3a4 <fix16_mul+0x88>
 38c:   e24b3024        sub     r3, fp, #36     ; 0x24
 390:   e893000c        ldm     r3, {r2, r3}
 394:   e2528001        subs    r8, r2, #1
 398:   e2c39000        sbc     r9, r3, #0
 39c:   e50b8024        str     r8, [fp, #-36]  ; 0xffffffdc
 3a0:   e50b9020        str     r9, [fp, #-32]  ; 0xffffffe0
 3a4:   e24b1024        sub     r1, fp, #36     ; 0x24
 3a8:   e8910003        ldm     r1, {r0, r1}
 3ac:   e3a02000        mov     r2, #0
 3b0:   e3a03000        mov     r3, #0
 3b4:   e1a02820        lsr     r2, r0, #16
 3b8:   e1822801        orr     r2, r2, r1, lsl #16
 3bc:   e1a03841        asr     r3, r1, #16
 3c0:   e1a03002        mov     r3, r2
 3c4:   e50b3028        str     r3, [fp, #-40]  ; 0xffffffd8
 3c8:   e24b1024        sub     r1, fp, #36     ; 0x24
 3cc:   e8910003        ldm     r1, {r0, r1}
 3d0:   e3a02000        mov     r2, #0
 3d4:   e3a03000        mov     r3, #0
 3d8:   e1a027a0        lsr     r2, r0, #15
 3dc:   e1822881        orr     r2, r2, r1, lsl #17
 3e0:   e1a037c1        asr     r3, r1, #15
 3e4:   e1a03002        mov     r3, r2
 3e8:   e2032001        and     r2, r3, #1
 3ec:   e51b3028        ldr     r3, [fp, #-40]  ; 0xffffffd8
 3f0:   e0823003        add     r3, r2, r3
 3f4:   e50b3028        str     r3, [fp, #-40]  ; 0xffffffd8
 3f8:   e51b3028        ldr     r3, [fp, #-40]  ; 0xffffffd8
 3fc:   e1a00003        mov     r0, r3
 400:   e24bd018        sub     sp, fp, #24
 404:   e8bd0bf0        pop     {r4, r5, r6, r7, r8, r9, fp}
 408:   e12fff1e        bx      lr

Yet I see the umull instruction in there which is 32 x 32 -> 64.

Test generation is refactored slightly to remove repetition.

Cheers,
J.P.

@PetteriAimonen
Copy link
Owner

Somehow the assembler listing looks quite weird, are you sure you have compiler optimizations on? E.g. -Os is usually good. For example line 3f4 vs. 3f8 seems to store and load the same data right after each other, which makes me think this is compiled with -O0 (no optimizations).

@PetteriAimonen PetteriAimonen merged commit d308e46 into PetteriAimonen:master Aug 8, 2023
1 check passed
@JPHutchins
Copy link
Contributor Author

Somehow the assembler listing looks quite weird, are you sure you have compiler optimizations on? E.g. -Os is usually good. For example line 3f4 vs. 3f8 seems to store and load the same data right after each other, which makes me think this is compiled with -O0 (no optimizations).

Of course! It's down to 40B now 🎉

I must have expected the imported project to inherit from the parent CMake project's compile options. Of course, that would be bad if it was implicit, but I also can't figure out how to make it explicit! For future reference, this stuff is fairly common:

target_compile_options(libfixmath PRIVATE
    -ffunction-sections
    -fdata-sections
    -Os
)

@JPHutchins JPHutchins deleted the feature/hardware-division-option branch August 8, 2023 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants