Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ILP64 Accelerate #113

Merged
merged 4 commits into from
Apr 5, 2023
Merged

Add support for ILP64 Accelerate #113

merged 4 commits into from
Apr 5, 2023

Conversation

staticfloat
Copy link
Member

The new Accelerate released in macOS v13.3 provides two new interfaces; an upgraded LAPACK for the LP64 interface, and an all-new ILP64 interface (that uses the same upgraded LAPACK). These symbols are available from Accelerate with the suffix $NEWLAPACK and $NEWLAPACK$ILP64, respectively.

Unfortunately, this is not a "true" suffix, as Apple has decided to drop the trailing underscore from the typical F77 names, meaning that a symbol such as dgemm_ gets mangled to dgemm$NEWLAPACK, whereas a CBLAS symbol such as cblas_zdotc_sub gets mangled to cblas_zdotc_sub$NEWLAPACK. This means that we need to selectively erase the trailing underscore from some symbols when applying this Accelerate suffix.

To do this, we add a new feature, enabled by default only on Apple builds, called SYMBOL_TRIMMING, which allows a suffix_hint to contain the ASCII "substitution character" 0x1a as the first character of the suffix hint to mean "remove a trailing underscore when applying this suffix".

To make dealing with suffix hints easier for command-line users, these suffix hints are available for use in LBT_BACKING_LIBS by listing libraries separated by suffix hints with an exclamation point, e.g. libname!suffix.

The new Accelerate released in macOS v13.3 provides two new interfaces;
an upgraded LAPACK for the LP64 interface, and an all-new ILP64
interface (that uses the same upgraded LAPACK).  These symbols are
available from Accelerate with the suffix `$NEWLAPACK` and
`$NEWLAPACK$ILP64`, respectively.

Unfortunately, this is not a "true" suffix, as Apple has decided to drop
the trailing underscore from the typical F77 names, meaning that a
symbol such as `dgemm_` gets mangled to `dgemm$NEWLAPACK`, whereas a
CBLAS symbol such as `cblas_zdotc_sub` gets mangled to
`cblas_zdotc_sub$NEWLAPACK`.  This means that we need to selectively
erase the trailing underscore from some symbols when applying this
Accelerate suffix.

To do this, we add a new feature, enabled by default only on Apple
builds, called `SYMBOL_TRIMMING`, which allows a `suffix_hint` to
contain the ASCII "substitution character" `0x1a` as the first character
of the suffix hint to mean "remove a trailing underscore when applying
this suffix".

To make dealing with suffix hints easier for command-line users, these
suffix hints are available for use in `LBT_BACKING_LIBS` by listing
libraries separated by suffix hints with an exclamation point,  e.g.
`libname!suffix`.
@staticfloat
Copy link
Member Author

staticfloat commented Mar 29, 2023

To test this branch out on your local Julia, run the following script:

JULIA_PREFIX=/path/to/julia
# Back up normal LBT
for f in ${JULIA_PREFIX}/lib/julia/libblastrampoline*; do
    cp -v ${f} ${f}.backup
done

# Build new LBT
cd src
make

# Install new LBT into Julia's lib/julia directory
cp build/libblastrampoline.5.dylib ${JULIA_PREFIX}/lib/julia/libblastrampoline.5.dylib
cp build/libblastrampoline.5.dylib ${JULIA_PREFIX}/lib/julia/libblastrampoline.dylib

Then, when you launch Julia, load Accelerate via:

using LinearAlgebra

# Load LP64 interface first
BLAS.lbt_forward("/System/Library/Frameworks/Accelerate.framework/Accelerate"; suffix_hint="\x1a\$NEWLAPACK", verbose=true, clear=true)

# Load ILP64 interface next
BLAS.lbt_forward("/System/Library/Frameworks/Accelerate.framework/Accelerate"; suffix_hint="\x1a\$NEWLAPACK\$ILP64", verbose=true)

# Ensure that an ILP64 interface was actually loaded:
config = BLAS.get_config()
if !any(lib.interface == :ilp64 for lib in config.loaded_libs)
    @error("No ILP64 interfaces found; are you sure you're on macOS 13.3 or higher?!")
end

# Show LBT config:
display(BLAS.get_config())

@ViralBShah
Copy link
Collaborator

Verified that this works for me, and for peakflops() gives twice the flop rate.

@staticfloat
Copy link
Member Author

Can someone try running the LinearAlgebra tests with Accelerate loaded?

@ViralBShah
Copy link
Collaborator

Results of the LinearAlgebra tests. Looks pretty good. The get_num_threads needs adjusting for Accelerate, and very few tests are failing.

➜  julia git:(vs/accelerate64) ✗ ./julia test/runtests.jl LinearAlgebra         
Running parallel tests with:
  nworkers() = 8
  nthreads() = 1
  Sys.CPU_THREADS = 8
  Sys.total_memory() = 32.000 GiB
  Sys.free_memory() = 12.930 GiB

Test                          (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
LinearAlgebra/matmul               (4) |        started at 2023-04-01T23:41:15.776
LinearAlgebra/diagonal             (7) |        started at 2023-04-01T23:41:15.820
LinearAlgebra/addmul               (2) |        started at 2023-04-01T23:41:15.861
LinearAlgebra/special              (8) |        started at 2023-04-01T23:41:15.861
LinearAlgebra/dense                (5) |        started at 2023-04-01T23:41:15.862
LinearAlgebra/triangular           (3) |        started at 2023-04-01T23:41:15.862
LinearAlgebra/symmetric            (6) |        started at 2023-04-01T23:41:15.862
LinearAlgebra/bidiag               (9) |        started at 2023-04-01T23:41:15.862
LinearAlgebra/special              (8) |    83.84 |   2.54 |  3.0 |   13877.34 |  1423.22
LinearAlgebra/qr                   (8) |        started at 2023-04-01T23:42:39.839
LinearAlgebra/bidiag               (9) |    90.00 |   2.85 |  3.2 |   13317.18 |  1635.91
LinearAlgebra/cholesky             (9) |        started at 2023-04-01T23:42:45.923
LinearAlgebra/dense                (5) |   115.04 |   4.13 |  3.6 |   17711.68 |  1667.75
LinearAlgebra/blas                 (5) |        started at 2023-04-01T23:43:10.965
LinearAlgebra/diagonal             (7) |   119.24 |   4.46 |  3.7 |   18937.22 |  1695.06
LinearAlgebra/lu                   (7) |        started at 2023-04-01T23:43:15.170
LinearAlgebra/qr                   (8) |    42.00 |   1.19 |  2.8 |    6571.17 |  2135.47
LinearAlgebra/uniformscaling       (8) |        started at 2023-04-01T23:43:21.846
LinearAlgebra/cholesky             (9) |         failed at 2023-04-01T23:43:26.173
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
  Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)

Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
  Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)

Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
  Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)

Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
  Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)

Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
  Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)


LinearAlgebra/structuredbroadcast (10) |        started at 2023-04-01T23:43:28.313
LinearAlgebra/blas                 (5) |         failed at 2023-04-01T23:43:30.921
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/blas.jl:678
  Expression: default > 0
   Evaluated: 0 > 0

Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/blas.jl:680
  Expression: BLAS.get_num_threads() === 1
   Evaluated: 0 === 1


LinearAlgebra/symmetric            (6) |   136.01 |   3.62 |  2.7 |   19231.91 |  1740.31
LinearAlgebra/hessenberg           (6) |        started at 2023-04-01T23:43:31.953
LinearAlgebra/svd                 (11) |        started at 2023-04-01T23:43:32.994
LinearAlgebra/matmul               (4) |   140.62 |   4.53 |  3.2 |   21742.65 |  1389.61
LinearAlgebra/eigen                (4) |        started at 2023-04-01T23:43:36.533
LinearAlgebra/structuredbroadcast (10) |    17.92 |   0.77 |  4.3 |    3234.67 |   613.28
LinearAlgebra/tridiag             (10) |        started at 2023-04-01T23:43:46.360
LinearAlgebra/uniformscaling       (8) |    25.05 |   1.04 |  4.2 |    3629.15 |  2428.69
LinearAlgebra/lapack               (8) |        started at 2023-04-01T23:43:46.899
LinearAlgebra/hessenberg           (6) |    16.74 |   0.53 |  3.2 |    2489.17 |  1923.83
LinearAlgebra/lq                   (6) |        started at 2023-04-01T23:43:48.694
LinearAlgebra/lu                   (7) |    41.35 |   1.17 |  2.8 |    6116.89 |  2578.94
LinearAlgebra/adjtrans             (7) |        started at 2023-04-01T23:43:56.532
LinearAlgebra/lapack               (8) |    13.14 |   0.34 |  2.6 |    1572.33 |  2486.33
LinearAlgebra/generic              (8) |        started at 2023-04-01T23:44:00.046
LinearAlgebra/lq                   (6) |    16.15 |   0.49 |  3.0 |    2499.47 |  2104.75
LinearAlgebra/schur                (6) |        started at 2023-04-01T23:44:04.843
LinearAlgebra/eigen                (4) |    28.47 |   1.04 |  3.7 |    4258.50 |  1654.45
LinearAlgebra/bunchkaufman         (4) |        started at 2023-04-01T23:44:05.008
LinearAlgebra/svd                 (11) |    32.77 |   1.21 |  3.7 |    5443.56 |   807.22
LinearAlgebra/givens              (11) |        started at 2023-04-01T23:44:05.893
LinearAlgebra/adjtrans             (7) |    11.40 |   0.70 |  6.1 |    2016.61 |  2931.66
LinearAlgebra/pinv                 (7) |        started at 2023-04-01T23:44:07.945
LinearAlgebra/givens              (11) |     3.25 |   0.15 |  4.5 |     480.30 |  1011.66
LinearAlgebra/factorization       (11) |        started at 2023-04-01T23:44:09.141
LinearAlgebra/factorization       (11) |     1.58 |   0.03 |  1.9 |     250.17 |  1033.64
LinearAlgebra/abstractq           (11) |        started at 2023-04-01T23:44:10.736
LinearAlgebra/pinv                 (7) |     3.67 |   0.24 |  6.4 |     817.49 |  3158.14
LinearAlgebra/ldlt                 (7) |        started at 2023-04-01T23:44:11.615
LinearAlgebra/ldlt                 (7) |     0.31 |   0.00 |  0.0 |      47.32 |  3160.14
LinearAlgebra/abstractq           (11) |         failed at 2023-04-01T23:44:12.844
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/abstractq.jl:59
  Expression: Q[:, 1:3] == Q.Q[:, 1:3] == (Matrix(Q))[:, 1:3]
   Evaluated: [-0.35953221510956435 0.1965454404197131 0.519592431137261; -0.22582192557737624 -0.6725058409923694 0.6076020971223726; … ; -0.09448452509982379 -0.655284066450569 -0.5071271232591662; -0.5324407150637704 0.28215697975293297 -0.10211308565740215] == [-0.35953221510956435 0.1965454404197131 0.519592431137261; -0.22582192557737624 -0.6725058409923694 0.6076020971223726; … ; -0.09448452509982379 -0.655284066450569 -0.5071271232591662; -0.5324407150637704 0.28215697975293297 -0.10211308565740215] == [-0.35953221510956435 0.1965454404197131 0.5195924311372611; -0.22582192557737624 -0.6725058409923694 0.6076020971223726; … ; -0.09448452509982379 -0.655284066450569 -0.5071271232591662; -0.5324407150637704 0.28215697975293297 -0.10211308565740214]

Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/abstractq.jl:59
  Expression: Q[:, 1:3] == Q.Q[:, 1:3] == (Matrix(Q))[:, 1:3]
   Evaluated: ComplexF64[-0.49837132037257037 - 0.4782120069360776im -0.150972588569905 - 0.28336417274098175im 0.11710459089697636 + 0.24599737894264181im; -0.5297065808419519 - 0.2180434544860795im -0.258256476833034 + 0.09563123480722824im -0.269175148885196 - 0.16302309774541907im; … ; -0.27303350965655987 - 0.2065902594255449im 0.27335708885184995 + 0.3499871631428588im 0.4250421451010471 - 0.06661317167629388im; -0.05787258293172755 - 0.2426695859781627im 0.5018747549963603 + 0.4058519997055557im -0.016960609747618538 - 0.398499749909367im] == ComplexF64[-0.49837132037257037 - 0.4782120069360776im -0.150972588569905 - 0.28336417274098175im 0.11710459089697636 + 0.24599737894264181im; -0.5297065808419519 - 0.2180434544860795im -0.258256476833034 + 0.09563123480722824im -0.269175148885196 - 0.16302309774541907im; … ; -0.27303350965655987 - 0.2065902594255449im 0.27335708885184995 + 0.3499871631428588im 0.4250421451010471 - 0.06661317167629388im; -0.05787258293172755 - 0.2426695859781627im 0.5018747549963603 + 0.4058519997055557im -0.016960609747618538 - 0.398499749909367im] == ComplexF64[-0.49837132037257037 - 0.4782120069360776im -0.150972588569905 - 0.28336417274098175im 0.11710459089697636 + 0.24599737894264181im; -0.5297065808419519 - 0.2180434544860795im -0.258256476833034 + 0.09563123480722824im -0.269175148885196 - 0.16302309774541907im; … ; -0.27303350965655987 - 0.2065902594255449im 0.27335708885184995 + 0.3499871631428588im 0.4250421451010471 - 0.0666131716762939im; -0.05787258293172755 - 0.2426695859781627im 0.5018747549963603 + 0.4058519997055557im -0.016960609747618538 - 0.3984997499093669im]


LinearAlgebra/tridiag             (10) |    28.33 |   0.88 |  3.1 |    4643.44 |  1031.50
LinearAlgebra/generic              (8) |    17.30 |   0.60 |  3.5 |    2544.61 |  2719.42
LinearAlgebra/bunchkaufman         (4) |    12.82 |   0.54 |  4.2 |    1991.96 |  1759.48
LinearAlgebra/addmul               (2) |   191.61 |   4.16 |  2.2 |   22170.09 |  1365.72
LinearAlgebra/triangular           (3) |   220.06 |   7.61 |  3.5 |   33463.90 |  3482.58
LinearAlgebra/schur                (6) |    65.27 |   0.25 |  0.4 |    1307.90 |  2300.78

Test Summary:                       |  Pass  Fail  Broken  Total     Time
  Overall                           | 97516     9      17  97542  3m55.4s
    LinearAlgebra/special           |  3675                 3675  1m24.7s
    LinearAlgebra/bidiag            |  4828                 4828  1m30.8s
    LinearAlgebra/dense             |  8561                 8561  1m55.9s
    LinearAlgebra/diagonal          |  2945                 2945  2m00.1s
    LinearAlgebra/qr                |  4782                 4782    42.0s
    LinearAlgebra/cholesky          |  2506     5           2511    40.3s
    LinearAlgebra/blas              |  1628     2           1630    20.0s
    LinearAlgebra/symmetric         |  3000                 3000  2m16.8s
    LinearAlgebra/matmul            |  1496                 1496  2m21.4s
    LinearAlgebra/structuredbroadcast |   672                  672    18.4s
    LinearAlgebra/uniformscaling    |   446                  446    25.0s
    LinearAlgebra/hessenberg        |   667                  667    16.7s
    LinearAlgebra/lu                |  1380                 1380    41.4s
    LinearAlgebra/lapack            |   818                  818    13.2s
    LinearAlgebra/lq                |  2898                 2898    16.1s
    LinearAlgebra/eigen             |   951                  951    28.5s
    LinearAlgebra/svd               |   606                  606    33.2s
    LinearAlgebra/adjtrans          |   479                  479    11.4s
    LinearAlgebra/givens            |  1991                 1991     3.3s
    LinearAlgebra/factorization     |    85            16    101     1.6s
    LinearAlgebra/pinv              |   500                  500     3.7s
    LinearAlgebra/ldlt              |     8                    8     0.3s
    LinearAlgebra/abstractq         |   104     2            106     2.1s
    LinearAlgebra/tridiag           |  1621                 1621    28.4s
    LinearAlgebra/generic           |   720             1    721    17.3s
    LinearAlgebra/bunchkaufman      |  5689                 5689    12.8s
    LinearAlgebra/addmul            |  5688                 5688  3m12.5s
    LinearAlgebra/triangular        | 38272                38272  3m40.9s
    LinearAlgebra/schur             |   500                  500  1m05.3s
    FAILURE

The global RNG seed was 0x2d0c35f148fbf575674e2e79e883fe44.

Error in testset LinearAlgebra/cholesky:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
  Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)

Error in testset LinearAlgebra/cholesky:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
  Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)

Error in testset LinearAlgebra/cholesky:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
  Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)

Error in testset LinearAlgebra/cholesky:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
  Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)

Error in testset LinearAlgebra/cholesky:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
  Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)

Error in testset LinearAlgebra/blas:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/blas.jl:678
  Expression: default > 0
   Evaluated: 0 > 0

Error in testset LinearAlgebra/blas:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/blas.jl:680
  Expression: BLAS.get_num_threads() === 1
   Evaluated: 0 === 1

Error in testset LinearAlgebra/abstractq:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/abstractq.jl:59
  Expression: Q[:, 1:3] == Q.Q[:, 1:3] == (Matrix(Q))[:, 1:3]
   Evaluated: [-0.35953221510956435 0.1965454404197131 0.519592431137261; -0.22582192557737624 -0.6725058409923694 0.6076020971223726; … ; -0.09448452509982379 -0.655284066450569 -0.5071271232591662; -0.5324407150637704 0.28215697975293297 -0.10211308565740215] == [-0.35953221510956435 0.1965454404197131 0.519592431137261; -0.22582192557737624 -0.6725058409923694 0.6076020971223726; … ; -0.09448452509982379 -0.655284066450569 -0.5071271232591662; -0.5324407150637704 0.28215697975293297 -0.10211308565740215] == [-0.35953221510956435 0.1965454404197131 0.5195924311372611; -0.22582192557737624 -0.6725058409923694 0.6076020971223726; … ; -0.09448452509982379 -0.655284066450569 -0.5071271232591662; -0.5324407150637704 0.28215697975293297 -0.10211308565740214]

Error in testset LinearAlgebra/abstractq:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/abstractq.jl:59
  Expression: Q[:, 1:3] == Q.Q[:, 1:3] == (Matrix(Q))[:, 1:3]
   Evaluated: ComplexF64[-0.49837132037257037 - 0.4782120069360776im -0.150972588569905 - 0.28336417274098175im 0.11710459089697636 + 0.24599737894264181im; -0.5297065808419519 - 0.2180434544860795im -0.258256476833034 + 0.09563123480722824im -0.269175148885196 - 0.16302309774541907im; … ; -0.27303350965655987 - 0.2065902594255449im 0.27335708885184995 + 0.3499871631428588im 0.4250421451010471 - 0.06661317167629388im; -0.05787258293172755 - 0.2426695859781627im 0.5018747549963603 + 0.4058519997055557im -0.016960609747618538 - 0.398499749909367im] == ComplexF64[-0.49837132037257037 - 0.4782120069360776im -0.150972588569905 - 0.28336417274098175im 0.11710459089697636 + 0.24599737894264181im; -0.5297065808419519 - 0.2180434544860795im -0.258256476833034 + 0.09563123480722824im -0.269175148885196 - 0.16302309774541907im; … ; -0.27303350965655987 - 0.2065902594255449im 0.27335708885184995 + 0.3499871631428588im 0.4250421451010471 - 0.06661317167629388im; -0.05787258293172755 - 0.2426695859781627im 0.5018747549963603 + 0.4058519997055557im -0.016960609747618538 - 0.398499749909367im] == ComplexF64[-0.49837132037257037 - 0.4782120069360776im -0.150972588569905 - 0.28336417274098175im 0.11710459089697636 + 0.24599737894264181im; -0.5297065808419519 - 0.2180434544860795im -0.258256476833034 + 0.09563123480722824im -0.269175148885196 - 0.16302309774541907im; … ; -0.27303350965655987 - 0.2065902594255449im 0.27335708885184995 + 0.3499871631428588im 0.4250421451010471 - 0.0666131716762939im; -0.05787258293172755 - 0.2426695859781627im 0.5018747549963603 + 0.4058519997055557im -0.016960609747618538 - 0.3984997499093669im]

ERROR: LoadError: Test run finished with errors
in expression starting at /Users/viral/julia/test/runtests.jl:93

@ViralBShah
Copy link
Collaborator

In the QR tests, it is a difference on the order of eps.

julia> Q[:,1:3] - Matrix(Q)[:,1:3]
5×3 Matrix{Float64}:
 0.0  0.0  -5.55112e-17
 0.0  0.0  -1.38778e-17
 0.0  0.0   0.0
 0.0  0.0   0.0
 0.0  0.0   0.0

@ViralBShah
Copy link
Collaborator

ViralBShah commented Apr 2, 2023

The pivoted cholesky failure is reproduced as follows:

julia> using LinearAlgebra, Test

julia> apdh = Float32[3.4291134 -0.61112815 0.8155207 -0.9183448 -0.61426914 1.021518 -0.07258626 -1.9560734 -1.6641214 0.4579194; -0.61112815 3.1250076 -0.44400603 0.97749346 -0.20464422 0.4505959 -1.1918031 -1.0662653 -0.5070573 -0.06222758; 0.8155207 -0.44400603 0.5413656 0.53000593 -0.4061885 0.028969476 -0.45150727 -0.00913016 -0.61976415 -0.17616473; -0.9183448 0.97749346 0.53000593 5.108155 -0.44426316 -0.5661762 -1.5497217 0.012556124 -1.0134082 0.52672774; -0.61426914 -0.20464422 -0.4061885 -0.44426316 2.1013746 -0.0008714596 1.9995844 -0.80452967 0.81173474 1.0536224; 1.021518 0.4505959 0.028969476 -0.5661762 -0.0008714596 2.52233 0.46488014 -0.90563935 -0.44942248 -0.03283302; -0.07258626 -1.1918031 -0.45150727 -1.5497217 1.9995844 0.46488014 3.378583 -0.98002505 0.58866936 0.6275643; -1.9560734 -1.0662653 -0.00913016 0.012556124 -0.80452967 -0.90563935 -0.98002505 3.117008 1.2102712 -1.4144715; -1.6641214 -0.5070573 -0.61976415 -1.0134082 0.81173474 -0.44942248 0.58866936 1.2102712 1.9170706 0.44138947; 0.4579194 -0.06222758 -0.17616473 0.52672774 1.0536224 -0.03283302 0.6275643 -1.4144715 0.44138947 2.4111974]
10×10 Matrix{Float32}:
  3.42911    -0.611128    0.815521    -0.918345   -0.614269     1.02152     -0.0725863  -1.95607     -1.66412    0.457919
 -0.611128    3.12501    -0.444006     0.977493   -0.204644     0.450596    -1.1918     -1.06627     -0.507057  -0.0622276
  0.815521   -0.444006    0.541366     0.530006   -0.406188     0.0289695   -0.451507   -0.00913016  -0.619764  -0.176165
 -0.918345    0.977493    0.530006     5.10815    -0.444263    -0.566176    -1.54972     0.0125561   -1.01341    0.526728
 -0.614269   -0.204644   -0.406188    -0.444263    2.10137     -0.00087146   1.99958    -0.80453      0.811735   1.05362
  1.02152     0.450596    0.0289695   -0.566176   -0.00087146   2.52233      0.46488    -0.905639    -0.449422  -0.032833
 -0.0725863  -1.1918     -0.451507    -1.54972     1.99958      0.46488      3.37858    -0.980025     0.588669   0.627564
 -1.95607    -1.06627    -0.00913016   0.0125561  -0.80453     -0.905639    -0.980025    3.11701      1.21027   -1.41447
 -1.66412    -0.507057   -0.619764    -1.01341     0.811735    -0.449422     0.588669    1.21027      1.91707    0.441389
  0.457919   -0.0622276  -0.176165     0.526728    1.05362     -0.032833     0.627564   -1.41447      0.441389   2.4112

julia> cpapd = cholesky(apdh, RowMaximum())
CholeskyPivoted{Float32, Matrix{Float32}, Vector{Int64}}
U factor with rank 10:
10×10 UpperTriangular{Float32, Matrix{Float32}}:
 2.26012  -0.406325   0.432496  -0.68568   -0.250507  -0.196566  -0.448386    0.0055555   0.234503     0.233053
  ⋅        1.80666   -0.240994  -0.19439    0.509079  -0.384212  -1.02195    -1.08145     0.504138     0.305877
  ⋅         ⋅         1.69702   -0.555147   0.401659  -0.125056  -0.329646   -0.78331    -0.24981     -0.052626
  ⋅         ⋅          ⋅         1.60077    0.384224   1.07492   -0.0627441  -1.01282    -0.207023     0.510761
  ⋅         ⋅          ⋅          ⋅         1.3753    -0.158      0.0836318   0.254541    0.00796201  -0.22197
  ⋅         ⋅          ⋅          ⋅          ⋅         0.847978   0.436789   -0.221684    0.0308417    0.738552
  ⋅         ⋅          ⋅          ⋅          ⋅          ⋅         0.601048   -0.230679   -0.181164     0.946928
  ⋅         ⋅          ⋅          ⋅          ⋅          ⋅          ⋅          0.375373    0.24624     -0.453482
  ⋅         ⋅          ⋅          ⋅          ⋅          ⋅          ⋅           ⋅          0.180238    -0.167688
  ⋅         ⋅          ⋅          ⋅          ⋅          ⋅          ⋅           ⋅           ⋅           0.523937
permutation:
10-element Vector{Int64}:
  4
  1
  2
  7
  6
  5
  9
  8
  3
 10

julia> @test all(diff(diag(real(cpapd.factors))).<=0.) # diagonal should be non-increasing
Test Failed at REPL[110]:1
  Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)

ERROR: There was an error during testing

Any ideas? @amontoison @dkarrasch

@ViralBShah
Copy link
Collaborator

@staticfloat I believe it is fine to merge this PR.

@amontoison
Copy link
Contributor

Any ideas? @amontoison @dkarrasch

I highly suspect a bug in the new BLAS / LAPACK routines of Apple Accelerate.
I have a different result with OpenBLAS and Intel MKL.

julia> cpapd = cholesky(apdh, RowMaximum())
CholeskyPivoted{Float32, Matrix{Float32}, Vector{Int64}}
U factor with rank 10:
10×10 UpperTriangular{Float32, Matrix{Float32}}:
 2.26012  -0.406325   0.432496  -0.68568    0.233053  -0.250507  -0.196566   -0.448386    0.0055555   0.234503
          1.80666   -0.240994  -0.19439    0.305877   0.509079  -0.384212   -1.02195    -1.08145     0.504138
                    1.69702   -0.555147  -0.052626   0.401659  -0.125056   -0.329646   -0.78331    -0.24981
                              1.60077    0.510761   0.384224   1.07492    -0.0627441  -1.01282    -0.207023
                                        1.4141    -0.21588    0.467681    0.617479   -0.430583   -0.206794
                                                  1.35825   -0.0856505   0.182823    0.1893     -0.0248058
                                                            0.7197      0.116784   -0.014745    0.166019
                                                                       0.361517    0.0519991   0.0500296
                                                                                  0.289647    0.155059
                                                                                             0.17166
permutation:
10-element Vector{Int64}:
  4
  1
  2
  7
 10
  6
  5
  9
  8
  3

@ViralBShah
Copy link
Collaborator

ViralBShah commented Apr 3, 2023

@amontoison That is what I observed too with openblas, and I was just wondering if we are doing something non-standard or making assumptions in the way we call those routines right now. Great idea to check with MKL, because that suggests that it really is a bug.

@staticfloat
Copy link
Member Author

The one thing I want to do before merging this PR is to provide sane fallback behavior for the set/get threads API when the underlying library has no concept of threads. My plan is to make set threads do nothing, and get threads always return 1.

@ViralBShah
Copy link
Collaborator

ViralBShah commented Apr 3, 2023

Does someone here know what LAPACK call is the culprit for the wrong pivoted cholesky results? Is it xPSTRF?

@amontoison
Copy link
Contributor

Yes, it seems that it's pstrf.

@ViralBShah
Copy link
Collaborator

Do the Intel macs also have the updated ILP64 BLAS and LAPACK?

@ctkelley
Copy link

ctkelley commented Apr 4, 2023

Yes. @staticfloat 's instructions worked fine on my 1999 Intel Mac.

@staticfloat
Copy link
Member Author

I'm reporting the dpstrf bug to Apple filed as FB 12098971, easy reproducer available here: https://gist.github.com/staticfloat/494272e86b44a3e2d31de898473b3c64

This isolates a known LAPACK failure on the new Accelerate; use it to
track the issue as Apple hopefully fixes it.
Because we now know that Accelerate has a problem with `dpstrf()`, let's
not default to the new LAPACK version quite yet.
@staticfloat staticfloat force-pushed the sf/ilp64_accelerate branch from 8022434 to daa47a3 Compare April 5, 2023 15:21
@ViralBShah
Copy link
Collaborator

ViralBShah commented Apr 5, 2023

I suspect that the old LAPACK release that Apple used to ship (LAPACK 3.1 IIRC) didn't even have dpstrf. So may be ok to not worry about it.

I assume the plan for now should be for LBT to support all the new stuff, but Julia to default to openblas, with users opting into using the new capabilities through AppleAccelerate.jl (like we do MKL.jl).

@staticfloat
Copy link
Member Author

I suspect that the old LAPACK release that Apple used to ship (LAPACK 3.1 IIRC) didn't even have dpstrf. So may be ok to not worry about it.

The old LAPACK release did have dsptrf_, and it calculated it correctly. So it appears to be a regression in the new LAPACK.

I assume the plan for now should be for LBT to support all the new stuff, but Julia to default to openblas, with users opting into using the new capabilities through AppleAccelerate.jl (like we do MKL.jl).

Yes. I think AppleAccelerate.jl should incorporate some LBT invocations similar to what I posted in this thread, so that it can load the ILP64 and new LP64-LAPACK bindings instead of the old ones. But that can only happen after this new release is cut.

@staticfloat staticfloat merged commit b6320c1 into main Apr 5, 2023
@amontoison
Copy link
Contributor

Will we have the release 5.6.0 of LBT with Julia 1.9?

@ViralBShah
Copy link
Collaborator

We should open the PR on Julia master and mark it for backporting. Even if not 1.9, I think we can have it in 1.9.1.

@staticfloat
Copy link
Member Author

Apple got back to me; they have identified the issue and are trying to get a fix into a future macOS version.

@ViralBShah ViralBShah deleted the sf/ilp64_accelerate branch April 6, 2023 15:10
@amontoison
Copy link
Contributor

We should open the PR on Julia master and mark it for backporting. Even if not 1.9, I think we can have it in 1.9.1.

Thanks @ViralBShah! I'm working with the HSL team to release of JuliaHSL package with a precompiled HSL_jll inside (we can't precompile it with Yggdrasil because of the academic license).
It convinces us to precompile HSL_jll with LBT for Julia ≥ 1.9.

@ViralBShah
Copy link
Collaborator

ViralBShah commented Apr 7, 2023

Wouldn't the licensing issue be the same if inside HSL.jl vs BB? Perhaps the right thing is for them to serve the JLL through a separate registry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants