CMSIS routines for `sin`/`cos` are not used in FOC #67

pattacini · 2023-11-21T21:00:24Z

While I was looking at the generated code of the FOC in the AMCBLDC, I spotted that the trigonometric functions for sin and cos are implemented by resorting to std instead of CMSIS.

I spent quite some time investigating this issue and, from the code generation report, it seems that the replacement library for the Cortex-M correctly detects the presence of sin and cos although it doesn't apply the substitution.

To check the performance gain, we can simply replace std::sin and std::cos with arm_sin_f32 and arm_cos_f32. However, the problem still remains when it comes to code generation.

There are easy alternatives that recruit proper approximations of sin and cos but the reason why the link to CMSIS is not performed is unclear.

Definitely, worth investigating.

The text was updated successfully, but these errors were encountered:

pattacini · 2023-11-21T21:01:29Z

cc @sgiraz @mfussi66 @marcoaccame @Nicogene

pattacini · 2023-11-22T08:30:05Z

For the sake of clearness, std::sin and std::cos are being used also in our official code, so this is happening irrespective of the MATLAB release.

marcoaccame · 2023-11-22T08:35:35Z

Hi @pattacini, good spot.

I have just verified that the amcbldc does NOT use any cmsis function apart arm_mult_f32(), needed in thermal_model.cpp.

pattacini · 2023-11-22T08:48:23Z

Yep @marcoaccame 👍🏻
The main routines needing CMSIS are actually sin and cos, although we can check it out more deeply.

pattacini · 2023-11-26T16:36:38Z

I think I got it.

I made a very simple model to investigate this issue, which is illustrated below.

I compared the code generated from the same Sin block, where in the first instance I used the plain function, while in the second instance (in green) I enabled the approximation.

The first block calls std::sin, whereas the second block calls our beloved arm_sin_f32 🎉

The interpretation is that MathWorks has enhanced the options available in certain functions to let the user decide whether to go with the approximation (i.e., CMSIS¹) or not.

CMSIS applies nothing more than a lookup table. Therefore, if we don't use the Code Replacement Library, the generated code contains the MathWorks lookup table; otherwise, CMSIS gets linked. ↩

pattacini · 2023-11-26T16:40:07Z

Once ready, I'll apply the necessary fix via:

Models updated to R2023b and declared Ts_* constants #64 (comment).

cc @Nicogene @sgiraz @mfussi66 @marcoaccame @maggia80

pattacini · 2023-12-14T07:50:23Z

Hi @sgiraz @Nicogene

I would keep this task in review until we provide a statistics of the speed of the FOC before and after the PR #64.

We do expect improvements in terms of timing now that we are again hooked on the CMSIS for sin and cos functions.

sgiraz · 2023-12-14T15:06:33Z

Hi @pattacini

here is the comparison of the statistics between r2023a and r2023b for the FOC control of both duration and frequency

AMCBLDC (experiment_supervisor_user FOC timing test)

	2023a	2023b (latest devel)
FOC Duration	~14 us	~14 us
FOC Period	~36 us	~36 us
Events Viewer

As we can see they are pretty similar. "Unfortunately", does not change a lot :(

cc @Nicogene @marcoaccame

pattacini · 2023-12-14T15:16:01Z

This is quite surprising. Essentially, this means that calls to std::sin (or sinf) and calls to arm_sin_f32 are the same in terms of performance.

At any rate, in the recent past, I saw that the LUT used by CMSIS contains 512 positions. Maybe, too many.

sgiraz · 2023-12-14T15:31:16Z

This is quite surprising. Essentially, this means that calls to std::sin (or sinf) and calls to arm_sin_f32 are the same in terms of performance.

Maybe we can try to create an ad-hoc test to validate this statement further. A simple application with the call to the std::sin first and then a call to the arm_sin_f32 function measuring the execution time. Let me know if you think it could be necessary/helpful.

pattacini · 2023-12-14T15:32:40Z

Maybe we can try to create an ad-hoc test to validate this statement further. A simple application with the call to the std::sin first and then a call to the arm_sin_f32 function measuring the execution time. Let me know if you think it could be necessary/helpful.

Good idea 👍🏻

pattacini · 2023-12-30T13:33:24Z

Considering the cumbersomeness required by the use of CMSIS, if we demonstrate practically that STD and CMSIS perform equally, then I'm in favor of rolling back to STD.

cc @sgiraz @mfussi66

pattacini · 2024-01-02T16:30:22Z

A couple of relevant resources:

Unless we want to try out CORDIC with the native MATLAB implementation of SinCos, they tell us that it's probably a good option to roll back to STD.

Let's do the planned tests anyway!

sgiraz · 2024-01-19T13:38:34Z

I performed a quick test using the basic project on amc2c. The test involves 100 iterations of cosine calculations from 0 to 2pi using both std::cos and arm_cos_f32.

	std::cos	arm_cos_f32	std::sin	arm_sin_f32
`amcbldc time`	0.5526 us	0.4254 us	0.5880 us	0.4259 us
`amc2c time`	1.6621 us	1.21985 us	1.73425 us	1.1984 us

Note

This table contains the average time used to calculate the trigonometric function 1 time.

Here is the code I used for the above tests:

void test_init();
void test_on();
void test_off();

constexpr uint32_t N_TIMES {100};
constexpr float32_t step = (2*PI) / N_TIMES;

int main() {
    test_init();
    
    // Preparing the array of samples
    std::array<float_t, N_TIMES> values = {0};
    for(int i = 0; i < N_TIMES; i++)
    {
        values[i] =  i*step;
    }  
    
    __disable_irq();

    for(;;)
    {
        test_on();
        
        for(const auto& value : values)
        {
            //std::cos(value);
            //arm_cos_f32(value);
            //arm_sin_f32(value);
            std::sin(value);

        }
        
        test_off();
    }
}

pattacini · 2024-01-24T10:25:45Z

To sum up, we learned that:

AMC is almost 3 times slower than AMCBLDC for these ops.
For AMCBLDC, STD and CMSIS perform similarly.
For AMC, STD requires $\approx 1$ $\mu s$ more than CMSIS per FOC instance (1 cos and 1 sin).

Important

I would switch back to STD, honestly, also to spare the safe-guard required by using CMSIS.

pattacini · 2024-01-24T12:56:27Z

I would switch back to STD, honestly, also to spare the #73 (comment) required by using CMSIS.

Changed implemented in #75.
Closing

@mfussi66, we ought to update the new arch model accordingly.

cc @Nicogene @sgiraz

pattacini · 2024-01-26T08:51:13Z

Opened again for further investigations.

sgiraz · 2024-01-26T18:59:12Z

Hi guys,

Today @marcoaccame and I tried to analyze even further the measurements of sin function between amc2c and amcbldc.

The code

constexpr uint32_t N_TIMES {1};                 // or 100 in the last case
constexpr float32_t step = (2*PI) / N_TIMES;    // values from 0 to 2PI

int main() {
    test_init();
    
    // Preparing the array of samples
    std::array<float_t, N_TIMES> values = {0};
    for(int i = 0; i < N_TIMES; i++)
    {
        values[i] =  i*step;
    }  
    
    __disable_irq();

    for(;;)
    {
        test_on();
        for(int i = 0; i < N_TIMES; i++)
        {
            //std::cos(value);
            //arm_cos_f32(value);
            std::sin(values[i]);
            //arm_sin_f32(values[i]);
        }
        test_off();
    }
}

on `amc2c` in usec (I*STEP)

std::sin	arm_sin_f32	optimization
3.08	2.90	-O0
0.54	1.62	balanced
0.58	1.64	fast
0.58	1.17	fast but divided by 100

on `amcbldc` in usec (I*STEP)

std::sin	arm_sin_f32	optimization
1.40	1.25	-O0
1.26	0.58	balanced
1.27?	0.59	fast
1.27?	0.37	fast but divided by 100

Note

when processing using the std::xxx, I excluded the arm_cortexM4lf_math.lib from the project in order to avoid unexpected/unwanted sideffects

Doing the Math

amc2c_CPU_Ts = 1/200 = 0.005 us
amcbldc_CPU_Ts = 1/168 = 0.00595238095238095238095238095238 us

Considering the std::sin VS the arm_sin_f32 using the balanced optimization

$\frac{1.26}{0.54} = \frac{amcbldc CPU_{Ts}}{amc2c CPU_{Ts}}$

$1.26 = \left( \frac{200}{168} \right) \cdot 0.54$

$1.26 = ~0.64$

Considering this result, we believe that the amc2c is faster than the amcbldc.

cc @pattacini @Nicogene

pattacini · 2024-01-26T19:26:58Z

The purpose of this issue is not to compare AMC vs. AMCBLDC but rather STD vs. CMSIS. In light of this, the results above seem to be contradictory.

How come CMSIS is far slower than STD on AMC?

A couple of more questions:

How do we justify the previous results with the latest one?
What is the optimization level we use by default in our FW?

pattacini · 2024-01-26T19:33:49Z

AMC pure math ops can be faster than those of AMCBLDC but as soon as there is some access to other resources we know that AMC is slower than AMCBLDC.

To a certain extent, this can explain why CMSIS on AMC is slower than STD as CMSIS probably requires several accesses to memory.

Nicogene · 2024-01-29T09:51:58Z

the results above seem to be contradictory.

In this sense, after the alignment at the standup, we decided to measure these timings pulling up and down pins and measuring the time elapsed with the oscilloscope.

cc @pattacini @marcoaccame @sgiraz

sgiraz · 2024-01-30T17:27:58Z

⚠️ [ WORK IN PROGRESS ] ⚠️

Today I started the measurements with the oscilloscope.
In agreement with @marcoaccame we decided to measure the FOC directly w/ and w/o the CMSIS enabled on amc2c.

Here is the code snippet:

    // TODO: set GPIO here (ON)
    //embot::hw::testpoint::on(tp1);                                                                                // OK
    //HAL_GPIO_WritePin(GPIOA, 0x0020, GPIO_PIN_SET);  // PORT(0), PIN(5), STATE(0=OFF (RESET), 1=ON (SET))           // OK
    GPIOA->BSRR = 0x0020;
    
    AMC_BLDC_step_FOC();
    
    // TODO: set GPIO here (OFF)
    //embot::hw::testpoint::off(tp1);
    //HAL_GPIO_WritePin(GPIOA, 0x0020, GPIO_PIN_RESET);  // PORT(0), PIN(5), STATE(0=OFF (RESET), 1=ON (SET))
    GPIOA->BSRR = (uint32_t)0x0020 << 16U;

`amc2c` FW@devel (using `CMSIS`, `-Os balanced` )

(14.80us w/ optimization -0fast)

`amc2c` FW@devel (using `std::sin` and `std::cos`, `-Os balanced` )

(14.20us w/ optimization -0fast)

Code changes in FOCInnerLoop.cpp:

-  q = arm_sin_f32(rtb_y);
+  q = std::sin(rtb_y);

...
-  rtb_Cos = arm_cos_f32(rtb_y);
+  rtb_Cos = std::cos(rtb_y);

Event viewer comparison

I also verified the measures with the EventViewer offered by Keil uVision (see embot::app::scope::SignalEViewer).
Using the same configuration as in the last case mentioned above: amc2c, std::sin, std::cos, -Os balanced, we got similar results:

`amcbldc` FW@devel (using `CMSIS`, `-Os balanced` )

`amcbldc` FW@devel (using `std::sin` and `std::cos`, `-Os balanced` )

cc @Nicogene

pattacini · 2024-02-01T13:52:27Z

Let me try to recap what I got from a F2F alignment with @sgiraz.

There's pretty much NO difference between using STD or CMSIS ops on AMCBLDC and AMC2C.
On the AMCBLDC, the FOC is taking up $8$ $\mu s$ and NOT $17$ $\mu s$. The difference is represented by the ADC operations.
FOC takes up $14$ $\mu s$ on AMC2C, longer than on AMCBLDC.

Point 1 completes this issue ✅ We go with STD. See FOC: use STD instead of CMSIS #75.
Follow-up question on Point 2: "Why does ADC last that long compared to FOC? Is there any margin for improvement?"
Follow-up question on Point 3: "Why is AMC2C slower than AMCBLDC?" We know this is an ongoing study and we're probably at the right point to contact ST.

cc @Nicogene @sgiraz @marcoaccame @maggia80 @mfussi66

Nicogene · 2024-02-02T13:46:27Z

I think that this kind of analysis went a little bit out of scope, the @sgiraz analysis allowed to understand that we are good at using the std sin/cos.
About the timings, I will open a follow-up issue for having a final measure verdict on the FOC duration of AMCBLDC

cc @sgiraz @pattacini @marcoaccame

pattacini self-assigned this Nov 21, 2023

pattacini added team-fix Activity performed by Team FIX domain-firmware Pertains to FW development domain-modeling Pertains to model-based design type-board-amcbldc Relevant to the AMCBLDC development labels Nov 21, 2023

pattacini linked a pull request Dec 8, 2023 that will close this issue

Models updated to R2023b and declared Ts_* constants #64

Merged

Nicogene assigned sgiraz Jan 17, 2024

pattacini mentioned this issue Jan 24, 2024

FOC: use STD instead of CMSIS #75

Merged

pattacini linked a pull request Jan 24, 2024 that will close this issue

FOC: use STD instead of CMSIS #75

Merged

pattacini closed this as completed Jan 24, 2024

pattacini reopened this Jan 26, 2024

Nicogene mentioned this issue Feb 2, 2024

Analyze FOC timing of AMCBLDC and AMC2C #80

Closed

Nicogene closed this as completed Feb 2, 2024

mfussi66 mentioned this issue Aug 30, 2024

Run a test campaign before merging AMCBLDC_experimental and AMCBLDC #92

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CMSIS routines for `sin`/`cos` are not used in FOC #67

CMSIS routines for `sin`/`cos` are not used in FOC #67

pattacini commented Nov 21, 2023 •

edited

Loading

pattacini commented Nov 21, 2023

pattacini commented Nov 22, 2023

marcoaccame commented Nov 22, 2023

pattacini commented Nov 22, 2023 •

edited

Loading

pattacini commented Nov 26, 2023 •

edited

Loading

pattacini commented Nov 26, 2023 •

edited

Loading

pattacini commented Dec 14, 2023

sgiraz commented Dec 14, 2023 •

edited by pattacini

Loading

pattacini commented Dec 14, 2023

sgiraz commented Dec 14, 2023

pattacini commented Dec 14, 2023

pattacini commented Dec 30, 2023

pattacini commented Jan 2, 2024 •

edited

Loading

sgiraz commented Jan 19, 2024 •

edited

Loading

pattacini commented Jan 24, 2024 •

edited

Loading

pattacini commented Jan 24, 2024 •

edited

Loading

pattacini commented Jan 26, 2024

sgiraz commented Jan 26, 2024 •

edited

Loading

pattacini commented Jan 26, 2024 •

edited

Loading

pattacini commented Jan 26, 2024

Nicogene commented Jan 29, 2024

sgiraz commented Jan 30, 2024 •

edited

Loading

pattacini commented Feb 1, 2024 •

edited

Loading

Nicogene commented Feb 2, 2024 •

edited

Loading

CMSIS routines for sin/cos are not used in FOC #67

CMSIS routines for sin/cos are not used in FOC #67

Comments

pattacini commented Nov 21, 2023 • edited Loading

pattacini commented Nov 21, 2023

pattacini commented Nov 22, 2023

marcoaccame commented Nov 22, 2023

pattacini commented Nov 22, 2023 • edited Loading

pattacini commented Nov 26, 2023 • edited Loading

Footnotes

pattacini commented Nov 26, 2023 • edited Loading

pattacini commented Dec 14, 2023

sgiraz commented Dec 14, 2023 • edited by pattacini Loading

AMCBLDC (experiment_supervisor_user FOC timing test)

pattacini commented Dec 14, 2023

sgiraz commented Dec 14, 2023

pattacini commented Dec 14, 2023

pattacini commented Dec 30, 2023

pattacini commented Jan 2, 2024 • edited Loading

sgiraz commented Jan 19, 2024 • edited Loading

pattacini commented Jan 24, 2024 • edited Loading

pattacini commented Jan 24, 2024 • edited Loading

pattacini commented Jan 26, 2024

sgiraz commented Jan 26, 2024 • edited Loading

The code

on amc2c in usec (I*STEP)

on amcbldc in usec (I*STEP)

Doing the Math

$\frac{1.26}{0.54} = \frac{amcbldc CPU_{Ts}}{amc2c CPU_{Ts}}$

$1.26 = \left( \frac{200}{168} \right) \cdot 0.54$

$1.26 = ~0.64$

pattacini commented Jan 26, 2024 • edited Loading

pattacini commented Jan 26, 2024

Nicogene commented Jan 29, 2024

sgiraz commented Jan 30, 2024 • edited Loading

⚠️ [ WORK IN PROGRESS ] ⚠️

amc2c FW@devel (using CMSIS, -Os balanced )

amc2c FW@devel (using std::sin and std::cos, -Os balanced )

Event viewer comparison

amcbldc FW@devel (using CMSIS, -Os balanced )

amcbldc FW@devel (using std::sin and std::cos, -Os balanced )

pattacini commented Feb 1, 2024 • edited Loading

Nicogene commented Feb 2, 2024 • edited Loading

CMSIS routines for `sin`/`cos` are not used in FOC #67

CMSIS routines for `sin`/`cos` are not used in FOC #67

pattacini commented Nov 21, 2023 •

edited

Loading

pattacini commented Nov 22, 2023 •

edited

Loading

pattacini commented Nov 26, 2023 •

edited

Loading

pattacini commented Nov 26, 2023 •

edited

Loading

sgiraz commented Dec 14, 2023 •

edited by pattacini

Loading

pattacini commented Jan 2, 2024 •

edited

Loading

sgiraz commented Jan 19, 2024 •

edited

Loading

pattacini commented Jan 24, 2024 •

edited

Loading

pattacini commented Jan 24, 2024 •

edited

Loading

sgiraz commented Jan 26, 2024 •

edited

Loading

on `amc2c` in usec (I*STEP)

on `amcbldc` in usec (I*STEP)

pattacini commented Jan 26, 2024 •

edited

Loading

sgiraz commented Jan 30, 2024 •

edited

Loading

`amc2c` FW@devel (using `CMSIS`, `-Os balanced` )

`amc2c` FW@devel (using `std::sin` and `std::cos`, `-Os balanced` )

`amcbldc` FW@devel (using `CMSIS`, `-Os balanced` )

`amcbldc` FW@devel (using `std::sin` and `std::cos`, `-Os balanced` )

pattacini commented Feb 1, 2024 •

edited

Loading

Nicogene commented Feb 2, 2024 •

edited

Loading