Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vcpkg baseline] Re-fix lapack-reference to find blas on windows-static #19608

Closed
wants to merge 16 commits into from

Conversation

JackBoosY
Copy link
Contributor

@JackBoosY JackBoosY commented Aug 17, 2021

The origin issue is:

openblas comes with lapack, lapack comes with blas, cblas, lapacke.
vcpkg only wants to build one lib of those in the ports, so lapack-reference should only build lapack, openblas should only build openblas. Due to the gfortran constrains lapack needs to build its internal blas in x64-windows-static.

The problem: FindBlas.cmake picks up a spurious openblas installation of vcpkg in one port since it was installed while another port was looking for it but that port did not declare the dependency on openblas -> error in colmap since openblas is not restored by CI.

This PR will

  1. Add port blas-reference and disable lapack-reference to build internal blas.
  2. Add FindBLAS.cmake provided by cmake and modify some codes to adapt to Windows or x64 build.
  3. Add vcpkg-cmake-wrapper to blas-reference to avoid finding openblas.

Related: #19628

Thanks @Neumann-A!

@JackBoosY JackBoosY added category:port-feature The issue is with a library, which is requesting new capabilities that didn’t exist info:internal This PR or Issue was filed by the vcpkg team. labels Aug 17, 2021
@JackBoosY
Copy link
Contributor Author

cc @Neumann-A @cenit

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new experimental fast check for PR issues. Please let us know if this bot is helpful!

After committing all other changes, the version database must be updated
git add -u && git commit
git checkout d679a1e0befa9c9e36fcda169c94f6daba2224b7 -- versions
./vcpkg x-add-version --all
Diff
diff --git a/versions/b-/blas.json b/versions/b-/blas.json
index 7fb0192..6d991bf 100644
--- a/versions/b-/blas.json
+++ b/versions/b-/blas.json
@@ -1,5 +1,10 @@
 {
   "versions": [
+    {
+      "git-tree": "2c785ddee8825820b71ad4f8f3be588c8158906e",
+      "version-semver": "3.10.0",
+      "port-version": 0
+    },
     {
       "git-tree": "2877c1693c63195d4edacfb42156c9d8874ad046",
       "version-string": "1",
diff --git a/versions/baseline.json b/versions/baseline.json
index 2855173..0ac5e18 100644
--- a/versions/baseline.json
+++ b/versions/baseline.json
@@ -421,7 +421,7 @@
       "port-version": 0
     },
     "blas": {
-      "baseline": "1",
+      "baseline": "3.10.0",
       "port-version": 0
     },
     "blaze": {

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new experimental fast check for PR issues. Please let us know if this bot is helpful!

After committing all other changes, the version database must be updated
git add -u && git commit
git checkout d679a1e0befa9c9e36fcda169c94f6daba2224b7 -- versions
./vcpkg x-add-version --all
Diff
diff --git a/versions/b-/blas.json b/versions/b-/blas.json
index 7fb0192..801ba36 100644
--- a/versions/b-/blas.json
+++ b/versions/b-/blas.json
@@ -1,5 +1,10 @@
 {
   "versions": [
+    {
+      "git-tree": "8876f20d092c072b4f7a80db2e5a877ab000dda9",
+      "version-semver": "3.10.0",
+      "port-version": 0
+    },
     {
       "git-tree": "2877c1693c63195d4edacfb42156c9d8874ad046",
       "version-string": "1",
diff --git a/versions/baseline.json b/versions/baseline.json
index 2855173..0ac5e18 100644
--- a/versions/baseline.json
+++ b/versions/baseline.json
@@ -421,7 +421,7 @@
       "port-version": 0
     },
     "blas": {
-      "baseline": "1",
+      "baseline": "3.10.0",
       "port-version": 0
     },
     "blaze": {

@Neumann-A
Copy link
Contributor

?!?!? who removed the blas metaport?

also what you are adding here is blas-reference. just like lapack-reference.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new experimental fast check for PR issues. Please let us know if this bot is helpful!

After committing all other changes, the version database must be updated
git add -u && git commit
git checkout d679a1e0befa9c9e36fcda169c94f6daba2224b7 -- versions
./vcpkg x-add-version --all
Diff
diff --git a/versions/b-/blas.json b/versions/b-/blas.json
index 7fb0192..d1c61b5 100644
--- a/versions/b-/blas.json
+++ b/versions/b-/blas.json
@@ -1,5 +1,10 @@
 {
   "versions": [
+    {
+      "git-tree": "75b6ee2b0d4236f48b310cc69de037bfb9b34cc6",
+      "version-semver": "3.10.0",
+      "port-version": 0
+    },
     {
       "git-tree": "2877c1693c63195d4edacfb42156c9d8874ad046",
       "version-string": "1",
diff --git a/versions/baseline.json b/versions/baseline.json
index 2855173..0ac5e18 100644
--- a/versions/baseline.json
+++ b/versions/baseline.json
@@ -421,7 +421,7 @@
       "port-version": 0
     },
     "blas": {
-      "baseline": "1",
+      "baseline": "3.10.0",
       "port-version": 0
     },
     "blaze": {

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new experimental fast check for PR issues. Please let us know if this bot is helpful!

After committing all other changes, the version database must be updated
git add -u && git commit
git checkout d679a1e0befa9c9e36fcda169c94f6daba2224b7 -- versions
./vcpkg x-add-version --all
Diff
diff --git a/versions/b-/blas.json b/versions/b-/blas.json
index 7fb0192..a283f48 100644
--- a/versions/b-/blas.json
+++ b/versions/b-/blas.json
@@ -1,5 +1,10 @@
 {
   "versions": [
+    {
+      "git-tree": "a297c47bd805c00b2308f027d288f3d9544aae33",
+      "version-semver": "3.10.0",
+      "port-version": 0
+    },
     {
       "git-tree": "2877c1693c63195d4edacfb42156c9d8874ad046",
       "version-string": "1",
diff --git a/versions/baseline.json b/versions/baseline.json
index 2855173..0ac5e18 100644
--- a/versions/baseline.json
+++ b/versions/baseline.json
@@ -421,7 +421,7 @@
       "port-version": 0
     },
     "blas": {
-      "baseline": "1",
+      "baseline": "3.10.0",
       "port-version": 0
     },
     "blaze": {

@JackBoosY
Copy link
Contributor Author

@Neumann-A I want to clarify the relationship between blas and lapack's reference and non-reference, can you elaborate on it?

@Neumann-A
Copy link
Contributor

@JackBoosY: Please keep blas a metaport. What you are adding here is blas-reference. There a many ways vcpkg could provide blas/lapack but the current idea is to have the blas port just check that blas can be found and depend on vcpkg defaults blas implementation (openblas). Using an overlay this could be changed to e.g. using mkl instead.

@JackBoosY
Copy link
Contributor Author

JackBoosY commented Aug 17, 2021

@Neumann-A What about the dependencies between them? This PR start with a ci regression when building colmap:x64-windows-static:

ninja: error: 'D:/installed/x64-windows-static/debug/lib/openblas.lib', needed by 'src/exe/colmap.exe', missing and no known rule to make it

Colmap's dependency chain:
colmap -> ceres/suitesparse -> lapack -> lapack-reference -> blas -> openblas
Their messy dependencies have now affected other ports such as colmap.

The fact is: lapack-reference generates blas and conflicts with openblas.

@Neumann-A
Copy link
Contributor

@JackBoosY That is just someone linking openblas but not declaring the dependency correctly:

neumann@heineken MINGW64 /e/qt6 (update_qt6_6.2)
$ ./vcpkg.exe depend-info colmap:x64-windows-static | findstr openblas

neumann@heineken MINGW64 /e/qt6 (update_qt6_6.2)

vs

neumann@heineken MINGW64 /e/qt6 (update_qt6_6.2)
$ ./vcpkg.exe depend-info colmap:x64-windows | findstr openblas
openblas:
blas: openblas

so some target has a dependency on openblas but does not have it in its build-depends which is why ci is not restoring openblas. Find the target which links openblas and make the dependency on blas explicit.

@Neumann-A
Copy link
Contributor

furthermore this is related to #15571. vcpkg needs to force the internally used blas and lapack version to avoid finding spurious other stuff.

@JackBoosY
Copy link
Contributor Author

JackBoosY commented Aug 17, 2021

@Neumann-A But the libblas generated by lapack-reference contains the same symbols as the openblas generated by openblas.

And according to the upstream issue OpenMathLib/OpenBLAS#203, the blas used in lapack-reference is not openblas.

@cenit
Copy link
Contributor

cenit commented Aug 17, 2021

And according to the upstream issue OpenMathLib/OpenBLAS#203, the blas used in lapack-reference is not openblas.

i think that issue talks about the opposite: openblas contains lapack-reference

the two algebra libraries are really tangled together, the best option is disabling lapack in openblas and blas in lapack-reference, and then explicitly depend on both.

as @Neumann-A is saying, colmap error should be just a matter of fact of a port depending (wrongly) on openblas(!static)

@JackBoosY
Copy link
Contributor Author

JackBoosY commented Aug 17, 2021

After comparing the blas source code in lapack-reference with the source code in blas, I found that they are almost the same (except some comments).
Although the feature noblas of lapack-reference depends on openblas, it still uses the internal blas code to generate libblas, and creates a link with the pkgconfig of openblas In the lib folder, this obviously violates the dependency.

@Neumann-A
Copy link
Contributor

The problem is lapack-reference cannot link against openblas from vcpkg if VCPKG_CRT_LINKAGE is static since gfortran cannot simply link against the MSVC CRT in this case. As such lapack-reference has to use its internal blas implementation in this case. As already said switching to ifort solves this problem.

@Neumann-A
Copy link
Contributor

After comparing the blas source code in lapack-reference with the source code in blas, I found that they are almost the same (except some comments).

Yeah because it is the reference implementation in both cases ;) lapack-reference could probably do a FETCH_CONTENT of the blas stuff

@JackBoosY
Copy link
Contributor Author

@JackBoosY That is just someone linking openblas but not declaring the dependency correctly:

neumann@heineken MINGW64 /e/qt6 (update_qt6_6.2)
$ ./vcpkg.exe depend-info colmap:x64-windows-static | findstr openblas

neumann@heineken MINGW64 /e/qt6 (update_qt6_6.2)

vs

neumann@heineken MINGW64 /e/qt6 (update_qt6_6.2)
$ ./vcpkg.exe depend-info colmap:x64-windows | findstr openblas
openblas:
blas: openblas

so some target has a dependency on openblas but does not have it in its build-depends which is why ci is not restoring openblas. Find the target which links openblas and make the dependency on blas explicit.

That because:

Feature: blas-select
Build-Depends: lapack-reference[core, noblas](!windows|!static)
Description: Use external optimized BLAS

According to my poor experience with blas and lapack, I can't distinguish the relationship between them at all.

@Neumann-A
Copy link
Contributor

Neumann-A commented Aug 17, 2021

@JackBoosY: Just think of the CMakeLists.txt of both ports as superbuilds.

openblas comes with lapack
lapack comes with blas, cblas, lapacke

vcpkg only wants to build one lib of those in the ports:
so lapack-reference should only build lapack
openblas should only build openblas

due to the gfortran constrains lapack needs to build its internal blas in x64-windows-static.
The problem: FindBlas.cmake picks up a spurious openblas installation of vcpkg in one port since it was installed while another port was looking for it but that port did not declare the dependency on openblas -> error in colmap since openblas is not restored by CI.

That is why vcpkg needs to force the correct blas library to be used via a wrapper.

@JackBoosY
Copy link
Contributor Author

JackBoosY commented Aug 17, 2021

The problem: FindBlas.cmake picks up a spurious openblas installation of vcpkg in one port since it was installed while another port was looking for it but that port did not declare the dependency on openblas -> error in colmap since openblas is not restored by CI.

But why create a link to the pkgconfig file of openblas? lapack-reference also provides blas.pc.
If it uses internal blas, it should not depend on blas in CONTROL.

@Neumann-A
Copy link
Contributor

If it uses internal blas, it should not depend on blas in CONTROL.

it does. since lapack is providing blas. in this case.

But why create a link to the pkgconfig file of openblas?

that is probably a bug.

@c72578 c72578 mentioned this pull request Aug 17, 2021
@JackBoosY
Copy link
Contributor Author

If it uses internal blas, it should not depend on blas in CONTROL.

it does. since lapack is providing blas. in this case.

But why create a link to the pkgconfig file of openblas?

that is probably a bug.

This will be fixed in #19628.

@JackBoosY JackBoosY added category:port-bug The issue is with a library, which is something the port should already support and removed category:port-feature The issue is with a library, which is requesting new capabilities that didn’t exist labels Sep 1, 2021
@PhoebeHui PhoebeHui added the category:port-feature The issue is with a library, which is requesting new capabilities that didn’t exist label Sep 1, 2021
@JackBoosY JackBoosY removed the category:port-feature The issue is with a library, which is requesting new capabilities that didn’t exist label Sep 1, 2021
@JackBoosY
Copy link
Contributor Author

This baseline regression blocked 5-6 PRs, @ras0219-msft please make some suggestions.

Copy link
Contributor

@ras0219-msft ras0219-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Neumann-A Thanks for the summary (#19608 (comment))

openblas comes with lapack
lapack comes with blas, cblas, lapacke

vcpkg only wants to build one lib of those in the ports:
so lapack-reference should only build lapack
openblas should only build openblas

To put more words on the subject for other readers, here are our objectives:

  1. If a consumer depends on blas, then find_package(BLAS) will resolve (+pkgconfig)
  2. If a consumer depends on lapack, then find_package(LAPACK) will resolve (+pkgconfig)
  3. There will be only one provider of BLAS or LAPACK symbols in a given build tree
  4. (ideally) Official vcpkg triplets will provide a consistent default choice for lapack backend. I'm not yet certain whether the gfortran issue makes this intractable for static-crt triplets.
  5. Consumers have a choice of backends to provide blas and lapack by replacing blas or lapack and their choice should be respected by the tree (including builds failing if appropriate)

Then, we need to determine what the space of backends looks like. Given @Neumann-A's summary, it looks to me like there are several possible configurations:

  1. lapack-reference[noblas] & openblas[nolapack]
  2. openblas[blas,lapack]
  3. mpl & lapack-reference[noblas] (?)
  4. lapack-reference[noblas] & blas-reference[noblas]
  5. Built in MacOS option for BLAS?

Open questions:

  1. How does cblas fit into this? Can we have it be implied by blas or is it useful for it to be a unique metaport?
  2. What restrictions do the major packages have w.r.t. incompatibilities (such as the gfortran issue)?
  3. What other major packages are relevant to this space?
  4. How do we test this? This is another "forks the entire graph" situation like openssl/libressl/boringssl. Perhaps we need to just pick the default subset that we will test the world against and relegate other configurations to community registries that can provide more domain-specific testing?

@@ -0,0 +1,1191 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be checking in a thousand lines of CMake upstream sources. Either:

  1. (bad option) This should be downloaded from GitHub (https://raw.githubusercontent.com/Kitware/CMake/v3.21.1/Modules/FindBLAS.cmake)
  2. (good option) The point of providing this overlay is to force the vcpkg BLAS version. Therefore, there is no need for hundreds of these lines -- we just need to define the outputs listed in https://cmake.org/cmake/help/latest/module/FindBLAS.html to point at the copy of BLAS we just built.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we just need to define the outputs listed in

Please also add the check that the libraries are actual usable.

Copy link
Contributor Author

@JackBoosY JackBoosY Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I manually put FindBLAS.cmake into the port directory is that the version provided by cmake is not suitable for the port blas-reference.
In windows, if BUILD_X64 is turned on, blas-reference will generate libblas64.lib, and when searching, only blas will be found.
And FindBLAS.cmake whcih provided by cmake doesn't provide any marco to fix that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the FindBLAS from CMake is not usable and we need to replace it in this case, that's fine. However, we do not need a thousand lines of CMake to do that -- a much smaller file will do fine.

@cenit
Copy link
Contributor

cenit commented Sep 3, 2021

about the configurations: we have also clapack already in tree, it was our default lapack port before lapack-reference was accepted

also, macOS Accelerate Framework contains BLAS but also LAPACK.

Final note: all implementations must have compatible ABI and they have, at least as far as i know (not headers, it is common to just call libraries knowing their interface)

Comment on lines +9 to +10
set(BLA_VENDOR "Generic")
_find_package(${ARGS})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set(BLA_VENDOR "Generic")
_find_package(${ARGS})
if (WIN32)
set(CMAKE_FIND_LIBRARY_PREFIXES_BAK ${CMAKE_FIND_LIBRARY_PREFIXES})
set(CMAKE_FIND_LIBRARY_PREFIXES lib)
endif()
set(BLA_VENDOR "Generic")
_find_package(${ARGS})
if (WIN32)
set(CMAKE_FIND_LIBRARY_PREFIXES ${CMAKE_FIND_LIBRARY_PREFIXES})
endif()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In FindBLAS.cmake line 1156:

# Generic BLAS library?
if(BLA_VENDOR STREQUAL "Generic" OR
   BLA_VENDOR STREQUAL "NVHPC" OR
   BLA_VENDOR STREQUAL "All")
  if(NOT BLAS_LIBRARIES)
    check_blas_libraries(
      BLAS_LIBRARIES
      BLAS
      sgemm
      ""
      "blas"
      ""
      ""
      ""
      )
  endif()
endif()

line 275:

function(CHECK_BLAS_LIBRARIES LIBRARIES _prefix _name _flags _list _deps _addlibdir _subdirs)
...
        find_library(${_prefix}_${_lib_var}_LIBRARY
          NAMES ${_library}
          NAMES_PER_DIR
          PATHS ${_extaddlibdir}
          PATH_SUFFIXES ${_subdirs}
        )

Unless we forced the 64bit openblas-reference library name to blas.lib or libblas.lib, we can't search it using FindBLAS.cmake provided by cmake.

@Neumann-A
Copy link
Contributor

Neumann-A commented Sep 3, 2021

I'm not yet certain whether the gfortran issue makes this intractable for static-crt triplets.

the problem lies mainly with the Fortran library. Because that is linked with the mingw stuff and requires ld as a linker due to internal symbol stuff. If We could link against a naively build windows Fortran library the problem goes away (#19413 does this by switching to ifort). One problem still is that the compiler Fortran libraries have not set the /APPCONTAINER bit so strictly speaking Fortran on UWP is currently not supported.
The only solution I see for the UWP problem in the future is to switch to the Fortran library provided by LLVM via Flang which will be build by vcpkg itself and will set the bit correclty.

There will be only one provider of BLAS or LAPACK symbols in a given build tree

that would be the ideal situation but it is already broken due to intel-mkl also providing blas/lapack symbols. It is more important to force the lookup of find_package(BLAS), find_package(LAPACK) to the selection of the metaport dependency blas/lapack. Here XOR features would be helpful to cover the cases without forcing users to use an overlay (Just like BLA_VENDOR in CMake).

How does cblas fit into this?

cblas is just a C wrapper to the legacy blas library. As such it can be installed using any blas implementation. It will wrap the fortran symbols using a cblas_ prefix

What restrictions do the major packages have w.r.t. incompatibilities (such as the gfortran issue)?

as said the incompatibility is mainly due to the linkage problems with libgfortran.lib in static builds. If we ca get a native windows variant there is only the issue with UWP an the missing appcontainer bit. There is no ABI implication.

What other major packages are relevant to this space?

Just look into CMakes FindBLAS.cmake/FindLAPACK.cmake. We currently have intel-mkl, openblas, blas-reference, lapack-reference, macOS Accelerate Framework, clapack (outdated IMHO). There are probably also variants with openMP/MPI enabled?

How do we test this? This is another "forks the entire graph" situation like openssl/libressl/boringssl. Perhaps we need to just pick the default subset that we will test the world against and relegate other configurations to community registries that can provide more domain-specific testing?

Give a reasonable default; Let the rest be handled by the community. There could be some basic testing of a minimal port tree which uses LAPACK/BLAS to check if it at least works. But that is always the case for features in ports. vcpkg never checks all features in CI.

@ras0219-msft
Copy link
Contributor

ras0219-msft commented Sep 3, 2021

If We could link against a naively build windows Fortran library the problem goes away

Hmm, maybe splitting the .lib back into objects and then using msvc to re-create the static lib would help? I'll try this.

Edit: I'm not sure how to precisely set up the conditions to test this.

... intel-mkl also providing blas/lapack ...

Isn't this handled by simply overlaying both lapack and blas to point at mkl then?

It is more important to force the lookup of find_package(BLAS), find_package(LAPACK) to the selection of the metaport dependency blas/lapack.

Agreed, this seems critical. If we think "cross" configurations like lapack-reference LAPACK on top of intel-mkl BLAS are also important, then this may require features to request "find_package(X) support" -- the exclusivity would be across packages and not within a package. It still is suboptimal, but I think it should be functional enough to work for now. Specifically, intel-mkl[lapack] would provide share/lapack/vcpkg-cmake-wrapper.cmake to force mkl as your lapack provider, but it wouldn't provide that by default. To make the global selection, you'd overlay a lapack -> intel-mkl[lapack] and blas -> intel-mkl[core]. Same story for the various other "interesting" configurations.

XOR features would be helpful to cover the cases without forcing users to use an overlay

Agreed, though I think that will still require the basic structure I've indicated above.

As such it can be installed using any blas implementation

But can we assume every blas implementation will install it? Do we need to have an explicit port that just creates the wrapper?

@@ -0,0 +1,5 @@
Use this package via the module FindBLAS that comes with CMake. To use in your CMakeLists.txt:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not true because we are checking-in our own modified version

@JackBoosY
Copy link
Contributor Author

The solution accepted, but the FindBLAS.cmake needs to be minimized.
Does FindBLAS.cmake only provided for port blas-reference? We have other port cblas, openblas.
We now have two approaches:

  1. Provide Findblas.cmake to blas-reference and reduce its content.
  2. Provide Findblas.cmake to all blas-like ports and keep all the content.

@JackBoosY
Copy link
Contributor Author

In favor of #21479.

@JackBoosY JackBoosY closed this Nov 19, 2021
@JackBoosY JackBoosY deleted the dev/jack/Re-add-blas branch May 26, 2022 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:port-bug The issue is with a library, which is something the port should already support info:internal This PR or Issue was filed by the vcpkg team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[lapack-reference] Does not contain instructions for using cblas
6 participants