Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] mulhigh improvements #2123

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

fredrik-johansson
Copy link
Collaborator

This is a work in progress.

  • Add semi-hardcoded basecases for 10 <= n <= 13 using the n=9 basecase + mpn_addmul_1 + corrections.

  • Add _flint_mpn_mulhigh_n_recursive which computes exactly the same result as _flint_mpn_mulhigh_basecase but does so by breaking into smaller muls and mulhighs. With more improvements, I think this approach could obsolete basecase, but it is not currently strictly faster.

  • Note to Albin: if the assembly _flint_mpn_mulhigh_basecase were to assume n >= 10, start by calling the 9x9 subroutine, and update from there, that might obsolete all the above.

  • Change _flint_mpn_mulhigh_n_mulders_recursive to add the diagonal corrections so that we compute the more accurate high product. This means that _flint_mpn_mulhigh_n_mulders can call _flint_mpn_mulhigh_n_mulders_recursive directly with length n; we no longer need to zero-pad the inputs and work with n + 1.

    n  OLD       NEW       speedup
    8  1.31e-08  1.31e-08  1.000x
    9  1.56e-08  1.56e-08  1.000x
   10  3.17e-08  2.12e-08  1.495x
   11  3.59e-08  2.74e-08  1.310x
   12  4.04e-08  3.48e-08  1.161x
   13  4.63e-08  4.22e-08  1.097x
   14  5.24e-08  4.69e-08  1.117x
   15   5.9e-08  5.13e-08  1.150x
   16  6.55e-08  5.84e-08  1.122x
   17  7.26e-08     6e-08  1.210x
   18  8.03e-08  6.82e-08  1.177x
   19  8.82e-08  8.27e-08  1.067x
   20   9.7e-08  9.31e-08  1.042x
   21  1.05e-07  1.03e-07  1.019x
   22  1.14e-07  1.14e-07  1.000x
   23  1.23e-07  1.24e-07  0.992x
   24  1.33e-07  1.33e-07  1.000x
   25  1.43e-07  1.43e-07  1.000x
   26  1.53e-07  1.54e-07  0.994x
   27  1.64e-07  1.65e-07  0.994x
   28  1.76e-07  1.76e-07  1.000x
   29  1.87e-07  1.87e-07  1.000x
   30  1.99e-07  1.99e-07  1.000x
   31  2.12e-07  2.11e-07  1.005x
   32  2.24e-07  2.24e-07  1.000x
   33  2.35e-07  2.38e-07  0.987x
   34   2.5e-07   2.5e-07  1.000x
   35  2.65e-07  2.64e-07  1.004x
   36  2.77e-07  2.78e-07  0.996x
   37  2.92e-07  2.92e-07  1.000x
   38  3.06e-07  3.07e-07  0.997x
   39  3.22e-07  3.21e-07  1.003x
   40  3.37e-07  3.37e-07  1.000x
   42   3.7e-07  3.46e-07  1.069x
   44  4.03e-07  3.76e-07  1.072x
   46  4.35e-07  4.27e-07  1.019x
   48  4.73e-07  4.39e-07  1.077x
   50   5.1e-07  4.68e-07  1.090x
   52  5.25e-07  5.08e-07  1.033x
   54  5.79e-07  5.29e-07  1.095x
   56  5.92e-07  5.74e-07  1.031x
   58  6.32e-07  6.05e-07  1.045x
   60  6.84e-07  6.63e-07  1.032x
   63  7.39e-07  7.27e-07  1.017x
   66  8.25e-07  7.56e-07  1.091x
   69  8.91e-07  8.39e-07  1.062x
   72  9.51e-07   9.1e-07  1.045x
   75  1.02e-06  9.74e-07  1.047x
   78   1.1e-06  1.06e-06  1.038x
   81  1.14e-06  1.12e-06  1.018x
   85  1.25e-06  1.22e-06  1.025x
   89  1.35e-06   1.3e-06  1.038x
   93  1.44e-06  1.38e-06  1.043x
   97  1.53e-06   1.5e-06  1.020x
  101  1.64e-06  1.58e-06  1.038x
  106  1.79e-06  1.78e-06  1.006x
  111  1.91e-06  1.87e-06  1.021x
  116  2.02e-06  2.05e-06  0.985x
  121  2.26e-06  2.16e-06  1.046x
  127  2.41e-06  2.39e-06  1.008x
  133  2.63e-06  2.57e-06  1.023x
  139  2.81e-06  2.76e-06  1.018x
  145  2.96e-06  2.97e-06  0.997x
  152   3.2e-06  3.25e-06  0.985x
  159  3.52e-06  3.49e-06  1.009x
  166  3.71e-06  3.69e-06  1.005x
  174  4.09e-06  3.93e-06  1.041x
  182  4.33e-06  4.21e-06  1.029x
  191  4.74e-06  4.75e-06  0.998x
  200  5.14e-06  5.03e-06  1.022x
  210  5.43e-06  5.45e-06  0.996x
  220  5.96e-06  5.81e-06  1.026x
  231  6.39e-06  6.33e-06  1.009x
  242  6.92e-06  6.96e-06  0.994x
  254  7.61e-06  7.36e-06  1.034x
  266  8.34e-06  8.37e-06  0.996x
  279     9e-06  8.83e-06  1.019x
  292  9.55e-06  9.53e-06  1.002x
  306  1.07e-05  1.07e-05  1.000x
  321  1.13e-05  1.17e-05  0.966x
  337  1.23e-05  1.26e-05  0.976x
  353  1.34e-05  1.37e-05  0.978x
  370  1.46e-05  1.47e-05  0.993x
  388  1.53e-05  1.57e-05  0.975x
  407  1.66e-05  1.68e-05  0.988x
  427  1.75e-05  1.79e-05  0.978x
  448  1.94e-05  1.87e-05  1.037x
  470  2.03e-05  2.06e-05  0.985x
  493  2.26e-05  2.17e-05  1.041x

@fredrik-johansson
Copy link
Collaborator Author

fredrik-johansson commented Dec 11, 2024

It looks harder to extend the hardcoded table for sqrhigh efficiently from C, at least if one wants more than a 10% or so improvement.

One can try a (n-2)x(n-2) sqrhigh together with a nx1 mul, or perhaps a (n/2) sqr together with a (n/2) mulhigh. In either case the problem is that combining the results is awkward, especially since the mul has to be doubled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant