Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMSIS routines for sin/cos are not used in FOC #67

Closed
pattacini opened this issue Nov 21, 2023 · 24 comments · Fixed by #64 or #75
Closed

CMSIS routines for sin/cos are not used in FOC #67

pattacini opened this issue Nov 21, 2023 · 24 comments · Fixed by #64 or #75
Assignees
Labels
domain-firmware Pertains to FW development domain-modeling Pertains to model-based design team-fix Activity performed by Team FIX type-board-amcbldc Relevant to the AMCBLDC development

Comments

@pattacini
Copy link
Member

pattacini commented Nov 21, 2023

While I was looking at the generated code of the FOC in the AMCBLDC, I spotted that the trigonometric functions for sin and cos are implemented by resorting to std instead of CMSIS.

I spent quite some time investigating this issue and, from the code generation report, it seems that the replacement library for the Cortex-M correctly detects the presence of sin and cos although it doesn't apply the substitution.

To check the performance gain, we can simply replace std::sin and std::cos with arm_sin_f32 and arm_cos_f32. However, the problem still remains when it comes to code generation.

There are easy alternatives that recruit proper approximations of sin and cos but the reason why the link to CMSIS is not performed is unclear.

Definitely, worth investigating.

@pattacini pattacini self-assigned this Nov 21, 2023
@pattacini pattacini added team-fix Activity performed by Team FIX domain-firmware Pertains to FW development domain-modeling Pertains to model-based design type-board-amcbldc Relevant to the AMCBLDC development labels Nov 21, 2023
@pattacini
Copy link
Member Author

cc @sgiraz @mfussi66 @marcoaccame @Nicogene

@pattacini
Copy link
Member Author

For the sake of clearness, std::sin and std::cos are being used also in our official code, so this is happening irrespective of the MATLAB release.

@marcoaccame
Copy link

Hi @pattacini, good spot.

I have just verified that the amcbldc does NOT use any cmsis function apart arm_mult_f32(), needed in thermal_model.cpp.

@pattacini
Copy link
Member Author

pattacini commented Nov 22, 2023

Yep @marcoaccame 👍🏻
The main routines needing CMSIS are actually sin and cos, although we can check it out more deeply.

@pattacini
Copy link
Member Author

pattacini commented Nov 26, 2023

I think I got it.

I made a very simple model to investigate this issue, which is illustrated below.

Screenshot 2023-11-26 172749

I compared the code generated from the same Sin block, where in the first instance I used the plain function, while in the second instance (in green) I enabled the approximation.

The first block calls std::sin, whereas the second block calls our beloved arm_sin_f32 🎉

The interpretation is that MathWorks has enhanced the options available in certain functions to let the user decide whether to go with the approximation (i.e., CMSIS1) or not.

Footnotes

  1. CMSIS applies nothing more than a lookup table. Therefore, if we don't use the Code Replacement Library, the generated code contains the MathWorks lookup table; otherwise, CMSIS gets linked.

@pattacini
Copy link
Member Author

pattacini commented Nov 26, 2023

Once ready, I'll apply the necessary fix via:

cc @Nicogene @sgiraz @mfussi66 @marcoaccame @maggia80

@pattacini pattacini linked a pull request Dec 8, 2023 that will close this issue
@pattacini
Copy link
Member Author

Hi @sgiraz @Nicogene

I would keep this task in review until we provide a statistics of the speed of the FOC before and after the PR #64.

We do expect improvements in terms of timing now that we are again hooked on the CMSIS for sin and cos functions.

@sgiraz
Copy link
Contributor

sgiraz commented Dec 14, 2023

Hi @pattacini

here is the comparison of the statistics between r2023a and r2023b for the FOC control of both duration and frequency

AMCBLDC (experiment_supervisor_user FOC timing test)

2023a 2023b (latest devel)
FOC Duration ~14 us ~14 us
FOC Period ~36 us ~36 us
Events Viewer image image

As we can see they are pretty similar. "Unfortunately", does not change a lot :(

cc @Nicogene @marcoaccame

@pattacini
Copy link
Member Author

This is quite surprising. Essentially, this means that calls to std::sin (or sinf) and calls to arm_sin_f32 are the same in terms of performance.

At any rate, in the recent past, I saw that the LUT used by CMSIS contains 512 positions. Maybe, too many.

@sgiraz
Copy link
Contributor

sgiraz commented Dec 14, 2023

This is quite surprising. Essentially, this means that calls to std::sin (or sinf) and calls to arm_sin_f32 are the same in terms of performance.

Maybe we can try to create an ad-hoc test to validate this statement further. A simple application with the call to the std::sin first and then a call to the arm_sin_f32 function measuring the execution time. Let me know if you think it could be necessary/helpful.

@pattacini
Copy link
Member Author

Maybe we can try to create an ad-hoc test to validate this statement further. A simple application with the call to the std::sin first and then a call to the arm_sin_f32 function measuring the execution time. Let me know if you think it could be necessary/helpful.

Good idea 👍🏻

@pattacini
Copy link
Member Author

Considering the cumbersomeness required by the use of CMSIS, if we demonstrate practically that STD and CMSIS perform equally, then I'm in favor of rolling back to STD.

cc @sgiraz @mfussi66

@pattacini
Copy link
Member Author

pattacini commented Jan 2, 2024

A couple of relevant resources:

Unless we want to try out CORDIC with the native MATLAB implementation of SinCos, they tell us that it's probably a good option to roll back to STD.

Let's do the planned tests anyway!

@sgiraz
Copy link
Contributor

sgiraz commented Jan 19, 2024

I performed a quick test using the basic project on amc2c. The test involves 100 iterations of cosine calculations from 0 to 2pi using both std::cos and arm_cos_f32.

std::cos arm_cos_f32 std::sin arm_sin_f32
amcbldc time 0.5526 us 0.4254 us 0.5880 us 0.4259 us
amc2c time 1.6621 us 1.21985 us 1.73425 us 1.1984 us

Note

This table contains the average time used to calculate the trigonometric function 1 time.

Here is the code I used for the above tests:

void test_init();
void test_on();
void test_off();

constexpr uint32_t N_TIMES {100};
constexpr float32_t step = (2*PI) / N_TIMES;

int main() {
    test_init();
    
    // Preparing the array of samples
    std::array<float_t, N_TIMES> values = {0};
    for(int i = 0; i < N_TIMES; i++)
    {
        values[i] =  i*step;
    }  
    
    __disable_irq();

    for(;;)
    {
        test_on();
        
        for(const auto& value : values)
        {
            //std::cos(value);
            //arm_cos_f32(value);
            //arm_sin_f32(value);
            std::sin(value);

        }
        
        test_off();
    }
}

@pattacini
Copy link
Member Author

pattacini commented Jan 24, 2024

To sum up, we learned that:

  • AMC is almost 3 times slower than AMCBLDC for these ops.
  • For AMCBLDC, STD and CMSIS perform similarly.
  • For AMC, STD requires $\approx 1$ $\mu s$ more than CMSIS per FOC instance (1 cos and 1 sin).

Important

I would switch back to STD, honestly, also to spare the safe-guard required by using CMSIS.

@pattacini pattacini linked a pull request Jan 24, 2024 that will close this issue
@pattacini
Copy link
Member Author

pattacini commented Jan 24, 2024

I would switch back to STD, honestly, also to spare the #73 (comment) required by using CMSIS.

Changed implemented in #75.
Closing

@mfussi66, we ought to update the new arch model accordingly.

cc @Nicogene @sgiraz

@pattacini pattacini reopened this Jan 26, 2024
@pattacini
Copy link
Member Author

Opened again for further investigations.

@sgiraz
Copy link
Contributor

sgiraz commented Jan 26, 2024

Hi guys,

Today @marcoaccame and I tried to analyze even further the measurements of sin function between amc2c and amcbldc.

The code

constexpr uint32_t N_TIMES {1};                 // or 100 in the last case
constexpr float32_t step = (2*PI) / N_TIMES;    // values from 0 to 2PI

int main() {
    test_init();
    
    // Preparing the array of samples
    std::array<float_t, N_TIMES> values = {0};
    for(int i = 0; i < N_TIMES; i++)
    {
        values[i] =  i*step;
    }  
    
    __disable_irq();

    for(;;)
    {
        test_on();
        for(int i = 0; i < N_TIMES; i++)
        {
            //std::cos(value);
            //arm_cos_f32(value);
            std::sin(values[i]);
            //arm_sin_f32(values[i]);
        }
        test_off();
    }
}

on amc2c in usec (I*STEP)

std::sin arm_sin_f32 optimization
3.08 2.90 -O0
0.54 1.62 balanced
0.58 1.64 fast
0.58 1.17 fast but divided by 100

on amcbldc in usec (I*STEP)

std::sin arm_sin_f32 optimization
1.40 1.25 -O0
1.26 0.58 balanced
1.27? 0.59 fast
1.27? 0.37 fast but divided by 100

Note

when processing using the std::xxx, I excluded the arm_cortexM4lf_math.lib from the project in order to avoid unexpected/unwanted sideffects

Doing the Math

amc2c_CPU_Ts = 1/200 = 0.005 us
amcbldc_CPU_Ts = 1/168 = 0.00595238095238095238095238095238 us

Considering the std::sin VS the arm_sin_f32 using the balanced optimization

$\frac{1.26}{0.54} = \frac{amcbldc CPU_{Ts}}{amc2c CPU_{Ts}}$

$1.26 = \left( \frac{200}{168} \right) \cdot 0.54$

$1.26 = ~0.64$

Considering this result, we believe that the amc2c is faster than the amcbldc.

cc @pattacini @Nicogene

@pattacini
Copy link
Member Author

pattacini commented Jan 26, 2024

The purpose of this issue is not to compare AMC vs. AMCBLDC but rather STD vs. CMSIS. In light of this, the results above seem to be contradictory.

How come CMSIS is far slower than STD on AMC?

A couple of more questions:

  • How do we justify the previous results with the latest one?
  • What is the optimization level we use by default in our FW?

@pattacini
Copy link
Member Author

AMC pure math ops can be faster than those of AMCBLDC but as soon as there is some access to other resources we know that AMC is slower than AMCBLDC.

To a certain extent, this can explain why CMSIS on AMC is slower than STD as CMSIS probably requires several accesses to memory.

@Nicogene
Copy link
Member

the results above seem to be contradictory.

In this sense, after the alignment at the standup, we decided to measure these timings pulling up and down pins and measuring the time elapsed with the oscilloscope.

cc @pattacini @marcoaccame @sgiraz

@sgiraz
Copy link
Contributor

sgiraz commented Jan 30, 2024

⚠️ [ WORK IN PROGRESS ] ⚠️

Today I started the measurements with the oscilloscope.
In agreement with @marcoaccame we decided to measure the FOC directly w/ and w/o the CMSIS enabled on amc2c.

Here is the code snippet:

    // TODO: set GPIO here (ON)
    //embot::hw::testpoint::on(tp1);                                                                                // OK
    //HAL_GPIO_WritePin(GPIOA, 0x0020, GPIO_PIN_SET);  // PORT(0), PIN(5), STATE(0=OFF (RESET), 1=ON (SET))           // OK
    GPIOA->BSRR = 0x0020;
    
    AMC_BLDC_step_FOC();
    
    // TODO: set GPIO here (OFF)
    //embot::hw::testpoint::off(tp1);
    //HAL_GPIO_WritePin(GPIOA, 0x0020, GPIO_PIN_RESET);  // PORT(0), PIN(5), STATE(0=OFF (RESET), 1=ON (SET))
    GPIOA->BSRR = (uint32_t)0x0020 << 16U;

amc2c FW@devel (using CMSIS, -Os balanced )

TEK00000

(14.80us w/ optimization -0fast)

amc2c FW@devel (using std::sin and std::cos, -Os balanced )

TEK00001

(14.20us w/ optimization -0fast)

Code changes in FOCInnerLoop.cpp:

-  q = arm_sin_f32(rtb_y);
+  q = std::sin(rtb_y);

...
-  rtb_Cos = arm_cos_f32(rtb_y);
+  rtb_Cos = std::cos(rtb_y);

Event viewer comparison

I also verified the measures with the EventViewer offered by Keil uVision (see embot::app::scope::SignalEViewer).
Using the same configuration as in the last case mentioned above: amc2c, std::sin, std::cos, -Os balanced, we got similar results:

image


amcbldc FW@devel (using CMSIS, -Os balanced )

TEK00002

amcbldc FW@devel (using std::sin and std::cos, -Os balanced )

TEK00003

cc @Nicogene

@pattacini
Copy link
Member Author

pattacini commented Feb 1, 2024

Let me try to recap what I got from a F2F alignment with @sgiraz.

  1. There's pretty much NO difference between using STD or CMSIS ops on AMCBLDC and AMC2C.
  2. On the AMCBLDC, the FOC is taking up $8$ $\mu s$ and NOT $17$ $\mu s$. The difference is represented by the ADC operations.
  3. FOC takes up $14$ $\mu s$ on AMC2C, longer than on AMCBLDC.
  • Point 1 completes this issue ✅ We go with STD. See FOC: use STD instead of CMSIS #75.
  • Follow-up question on Point 2: "Why does ADC last that long compared to FOC? Is there any margin for improvement?"
  • Follow-up question on Point 3: "Why is AMC2C slower than AMCBLDC?" We know this is an ongoing study and we're probably at the right point to contact ST.

cc @Nicogene @sgiraz @marcoaccame @maggia80 @mfussi66

@Nicogene
Copy link
Member

Nicogene commented Feb 2, 2024

I think that this kind of analysis went a little bit out of scope, the @sgiraz analysis allowed to understand that we are good at using the std sin/cos.
About the timings, I will open a follow-up issue for having a final measure verdict on the FOC duration of AMCBLDC

cc @sgiraz @pattacini @marcoaccame

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain-firmware Pertains to FW development domain-modeling Pertains to model-based design team-fix Activity performed by Team FIX type-board-amcbldc Relevant to the AMCBLDC development
Projects
None yet
4 participants