[WIP] mulhigh improvements #2123

fredrik-johansson · 2024-12-10T12:57:24Z

This is a work in progress.

Add semi-hardcoded basecases for 10 <= n <= 13 using the n=9 basecase + mpn_addmul_1 + corrections.
Add _flint_mpn_mulhigh_n_recursive which computes exactly the same result as _flint_mpn_mulhigh_basecase but does so by breaking into smaller muls and mulhighs. With more improvements, I think this approach could obsolete basecase, but it is not currently strictly faster.
Note to Albin: if the assembly _flint_mpn_mulhigh_basecase were to assume n >= 10, start by calling the 9x9 subroutine, and update from there, that might obsolete all the above.
Change _flint_mpn_mulhigh_n_mulders_recursive to add the diagonal corrections so that we compute the more accurate high product. This means that _flint_mpn_mulhigh_n_mulders can call _flint_mpn_mulhigh_n_mulders_recursive directly with length n; we no longer need to zero-pad the inputs and work with n + 1.

    n  OLD       NEW       speedup
    8  1.31e-08  1.31e-08  1.000x
    9  1.56e-08  1.56e-08  1.000x
   10  3.17e-08  2.12e-08  1.495x
   11  3.59e-08  2.74e-08  1.310x
   12  4.04e-08  3.48e-08  1.161x
   13  4.63e-08  4.22e-08  1.097x
   14  5.24e-08  4.69e-08  1.117x
   15   5.9e-08  5.13e-08  1.150x
   16  6.55e-08  5.84e-08  1.122x
   17  7.26e-08     6e-08  1.210x
   18  8.03e-08  6.82e-08  1.177x
   19  8.82e-08  8.27e-08  1.067x
   20   9.7e-08  9.31e-08  1.042x
   21  1.05e-07  1.03e-07  1.019x
   22  1.14e-07  1.14e-07  1.000x
   23  1.23e-07  1.24e-07  0.992x
   24  1.33e-07  1.33e-07  1.000x
   25  1.43e-07  1.43e-07  1.000x
   26  1.53e-07  1.54e-07  0.994x
   27  1.64e-07  1.65e-07  0.994x
   28  1.76e-07  1.76e-07  1.000x
   29  1.87e-07  1.87e-07  1.000x
   30  1.99e-07  1.99e-07  1.000x
   31  2.12e-07  2.11e-07  1.005x
   32  2.24e-07  2.24e-07  1.000x
   33  2.35e-07  2.38e-07  0.987x
   34   2.5e-07   2.5e-07  1.000x
   35  2.65e-07  2.64e-07  1.004x
   36  2.77e-07  2.78e-07  0.996x
   37  2.92e-07  2.92e-07  1.000x
   38  3.06e-07  3.07e-07  0.997x
   39  3.22e-07  3.21e-07  1.003x
   40  3.37e-07  3.37e-07  1.000x
   42   3.7e-07  3.46e-07  1.069x
   44  4.03e-07  3.76e-07  1.072x
   46  4.35e-07  4.27e-07  1.019x
   48  4.73e-07  4.39e-07  1.077x
   50   5.1e-07  4.68e-07  1.090x
   52  5.25e-07  5.08e-07  1.033x
   54  5.79e-07  5.29e-07  1.095x
   56  5.92e-07  5.74e-07  1.031x
   58  6.32e-07  6.05e-07  1.045x
   60  6.84e-07  6.63e-07  1.032x
   63  7.39e-07  7.27e-07  1.017x
   66  8.25e-07  7.56e-07  1.091x
   69  8.91e-07  8.39e-07  1.062x
   72  9.51e-07   9.1e-07  1.045x
   75  1.02e-06  9.74e-07  1.047x
   78   1.1e-06  1.06e-06  1.038x
   81  1.14e-06  1.12e-06  1.018x
   85  1.25e-06  1.22e-06  1.025x
   89  1.35e-06   1.3e-06  1.038x
   93  1.44e-06  1.38e-06  1.043x
   97  1.53e-06   1.5e-06  1.020x
  101  1.64e-06  1.58e-06  1.038x
  106  1.79e-06  1.78e-06  1.006x
  111  1.91e-06  1.87e-06  1.021x
  116  2.02e-06  2.05e-06  0.985x
  121  2.26e-06  2.16e-06  1.046x
  127  2.41e-06  2.39e-06  1.008x
  133  2.63e-06  2.57e-06  1.023x
  139  2.81e-06  2.76e-06  1.018x
  145  2.96e-06  2.97e-06  0.997x
  152   3.2e-06  3.25e-06  0.985x
  159  3.52e-06  3.49e-06  1.009x
  166  3.71e-06  3.69e-06  1.005x
  174  4.09e-06  3.93e-06  1.041x
  182  4.33e-06  4.21e-06  1.029x
  191  4.74e-06  4.75e-06  0.998x
  200  5.14e-06  5.03e-06  1.022x
  210  5.43e-06  5.45e-06  0.996x
  220  5.96e-06  5.81e-06  1.026x
  231  6.39e-06  6.33e-06  1.009x
  242  6.92e-06  6.96e-06  0.994x
  254  7.61e-06  7.36e-06  1.034x
  266  8.34e-06  8.37e-06  0.996x
  279     9e-06  8.83e-06  1.019x
  292  9.55e-06  9.53e-06  1.002x
  306  1.07e-05  1.07e-05  1.000x
  321  1.13e-05  1.17e-05  0.966x
  337  1.23e-05  1.26e-05  0.976x
  353  1.34e-05  1.37e-05  0.978x
  370  1.46e-05  1.47e-05  0.993x
  388  1.53e-05  1.57e-05  0.975x
  407  1.66e-05  1.68e-05  0.988x
  427  1.75e-05  1.79e-05  0.978x
  448  1.94e-05  1.87e-05  1.037x
  470  2.03e-05  2.06e-05  0.985x
  493  2.26e-05  2.17e-05  1.041x

fredrik-johansson · 2024-12-11T12:20:18Z

It looks harder to extend the hardcoded table for sqrhigh efficiently from C, at least if one wants more than a 10% or so improvement.

One can try a (n-2)x(n-2) sqrhigh together with a nx1 mul, or perhaps a (n/2) sqr together with a (n/2) mulhigh. In either case the problem is that combining the results is awkward, especially since the mul has to be doubled.

fredrik-johansson added 2 commits December 10, 2024 13:24

wip mulhigh improvements

4a06ecc

include diagonals in sqrhigh_mulders

b5676c7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] mulhigh improvements #2123

[WIP] mulhigh improvements #2123

fredrik-johansson commented Dec 10, 2024

fredrik-johansson commented Dec 11, 2024 •

edited

Loading

[WIP] mulhigh improvements #2123

Are you sure you want to change the base?

[WIP] mulhigh improvements #2123

Conversation

fredrik-johansson commented Dec 10, 2024

fredrik-johansson commented Dec 11, 2024 • edited Loading

fredrik-johansson commented Dec 11, 2024 •

edited

Loading