Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build not portable despite using SAGE_FAT_BINARY=yes, NTL/openblas/numpy-related #29537

Closed
mkoeppe opened this issue Apr 20, 2020 · 93 comments
Closed

Comments

@mkoeppe
Copy link
Contributor

mkoeppe commented Apr 20, 2020

We fix several issues involving SAGE_FAT_BINARY:

Depends on #31493

CC: @dimpase @orlitzky @kiwifb @embray @darijgr @tscrim @kliem @seblabbe

Component: porting

Keywords: sd111

Author: Jonathan Kliem, Matthias Koeppe

Branch/Commit: d6dced2

Reviewer: Matthias Koeppe, Jonathan Kliem

Issue created by migration from https://trac.sagemath.org/ticket/29537

@mkoeppe mkoeppe added this to the sage-9.1 milestone Apr 20, 2020
@dimpase
Copy link
Member

dimpase commented Apr 20, 2020

comment:3

could it be some kind of hardware mismatch?

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Apr 20, 2020

comment:4

Possibly, I'll investigate

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Apr 20, 2020

comment:5

I will try with the configuration that Erik's https://github.com/sagemath/sage-windows/blob/master/Makefile
uses

@mkoeppe

This comment has been minimized.

@mkoeppe mkoeppe changed the title Crash on building documentation and on startup, NTL-related, on cygwin-standard cygwin-standard: build not portable despite using SAGE_FAT_BINARY=yes, NTL-related May 5, 2020
@mkoeppe mkoeppe modified the milestones: sage-9.1, sage-9.2 May 5, 2020
@embray
Copy link
Contributor

embray commented Jun 26, 2020

comment:7

I looked at one of the build logs on GitHub and it does look like SAGE_FAT_BINARY=yes is being passed down, so other than that I'm not aware of any issue that would cause this unless there's a new issue with NTL I haven't seen before.

@embray
Copy link
Contributor

embray commented Jun 26, 2020

comment:8

Possibly related: #29109

@mkoeppe mkoeppe modified the milestones: sage-9.2, sage-9.3 Oct 24, 2020
@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 9, 2020

Changed keywords from none to sd111

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 25, 2020

comment:11

It's quite possible that this also affects our Docker builds.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 28, 2020

comment:12

A report regarding the Docker builds: https://groups.google.com/g/sage-devel/c/Dbd4nFi8R7I (but no NTL involvement visible)

@mkoeppe

This comment has been minimized.

@mkoeppe mkoeppe changed the title cygwin-standard: build not portable despite using SAGE_FAT_BINARY=yes, NTL-related build not portable despite using SAGE_FAT_BINARY=yes, NTL-related Dec 28, 2020
@kliem
Copy link
Contributor

kliem commented Dec 29, 2020

comment:13

From NTL/doc/tour-win.html:

152 Configuration flags.
153 </b>
154 <p>
155 
156 Also in directory "<tt>include/NTL</tt>" is a file called "<tt>config.h</tt>".
157 You can edit this file to override some of NTL's default options
158 for basic configuration and performance.
159 Again, the defaults should be good for
160 Windows with MSVC++.

In config.h there is a couple of flags that needs to be set by hand apparently.

Those look like they might make problems:

195 #if 0
196 #define NTL_CLEAN_INT
197 
198 /*
199  *   This will disallow the use of some non-standard integer arithmetic
200  *   that may improve performance somewhat.
201  *
202  */
203 
204 #endif
205 
206 #if 1
207 #define NTL_CLEAN_PTR
208 
209 /*
210  *   This will disallow the use of some non-standard pointer arithmetic
211  *   that may improve performance somewhat.
212  *
213  */
214 
215 #endif
216 
217 #if 1
218 #define NTL_SAFE_VECTORS
219 
220 /*
221  * This will compile NTL in "safe vector" mode, only assuming
222  * the relocatability property for trivial types and types
223  * explicitly declared relocatable.  See vector.txt for more details.
224  */
225 
226 #endif
227 
228 #if 0
229 #define NTL_ENABLE_AVX_FFT
230 
231 /*
232  * This will compile NTL in a way that enables an AVX implemention
233  * of the small-prime FFT.
234  */
235 
236 #endif
237 
238 
239 #if 0
240 #define NTL_AVOID_AVX512
241  
242 /*
243  * This will compile NTL in a way that avoids 512-bit operations,
244  * even if AVX512 is available.
245  */
246 
247 #endif
248  
249 #if 0
250 #define NTL_RANGE_CHECK
251 
252 /*
253  *   This will generate vector subscript range-check code.
254  *   Useful for debugging, but it slows things down of course.
255  *
256  */
257 
258 #endif

@kliem
Copy link
Contributor

kliem commented Dec 29, 2020

comment:14

So I guess we have to patch NTL as to avoid those flags by configuration.

@kliem
Copy link
Contributor

kliem commented Dec 29, 2020

comment:15

The relevant configuration is done in src/DoConfig, which is unfortunately perl (yet another language whichs syntax I do not understand).

But one might even define those things as environment variables and this might be passed through??

In src/DoConfig there is the following list:

 55 %ConfigFlag = (
 56 
 57 'NTL_LEGACY_NO_NAMESPACE' => 'off',
 58 'NTL_LEGACY_INPUT_ERROR'  => 'off',
 59 'NTL_DISABLE_LONGDOUBLE'  => 'off',
 60 'NTL_DISABLE_LONGLONG'    => 'off',
 61 'NTL_DISABLE_LL_ASM'      => 'off',
 62 'NTL_MAXIMIZE_SP_NBITS'   => 'off',
 63 'NTL_LEGACY_SP_MULMOD'    => 'off',
 64 'NTL_THREADS'             => 'on',
 65 'NTL_TLS_HACK'            => 'on',
 66 'NTL_EXCEPTIONS'          => 'off',
 67 'NTL_STD_CXX11'           => 'on',
 68 'NTL_STD_CXX14'           => 'off',
 69 'NTL_DISABLE_MOVE_ASSIGN' => 'on',
 70 'NTL_DISABLE_MOVE'        => 'off',
 71 'NTL_THREAD_BOOST'        => 'on',
 72 'NTL_GMP_LIP'             => 'on',
 73 'NTL_GF2X_LIB'            => 'off',
 74 'NTL_X86_FIX'             => 'off',
 75 'NTL_NO_X86_FIX'          => 'off',
 76 'NTL_NO_INIT_TRANS'       => 'on',
 77 'NTL_CLEAN_INT'           => 'off',
 78 'NTL_CLEAN_PTR'           => 'on',
 79 'NTL_SAFE_VECTORS'        => 'on',
 80 'NTL_RANGE_CHECK'         => 'off',
 81 'NTL_ENABLE_AVX_FFT'      => 'off',
 82 'NTL_AVOID_AVX512'        => 'off',
 83 
 84 
 85 'NTL_SPMM_ULL'            => 'off',
 86 'NTL_AVOID_BRANCHING'     => 'off',
 87 'NTL_FFT_BIGTAB'          => 'off',
 88 'NTL_FFT_LAZYMUL'         => 'off',
 89 'NTL_TBL_REM'             => 'off',
 90 'NTL_CRT_ALTCODE'         => 'off',
 91 'NTL_CRT_ALTCODE_SMALL'   => 'off',
 92 'NTL_GF2X_NOINLINE'       => 'off',
 93 'NTL_GF2X_ALTCODE'        => 'off',
 94 'NTL_GF2X_ALTCODE1'       => 'off',
 95 
 96 
 97 );

What makes me thing that it might be AVX512 related, is that we haven't encountered this before. Github workflows is the only place where I encountered processors capable of AVX512.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 29, 2020

comment:16

I agree that NTL_AVOID_AVX512 is a top suspect here.
Perhaps it can be passed to configure, like we already pass NATIVE=off when SAGE_FAT_BINARY=yes?

@embray
Copy link
Contributor

embray commented Mar 17, 2021

comment:67

Replying to @mkoeppe:

Yes, in this run I tried to fix the problem by changing to rebaseall...

Ah, OK. I think that's a good idea. It just needs to be run outside a Cygwin process with no other Cygwin processes running (since it can also modify the cygwin DLL itself and other support libraries that are not normally touched by plain rebase).

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Mar 17, 2021

comment:68

Is this the cygwin dash that should be used?

@embray
Copy link
Contributor

embray commented Mar 17, 2021

comment:69

Yes, or you can just call C:\cygwin64\bin\rebaseall directly without going through a shell, I think.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Mar 17, 2021

comment:70

Right, but I want to keep it in the extraction script so that it is easy to use manually too

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Mar 17, 2021

comment:71

OK, testing with dash at https://github.com/mkoeppe/sage/actions/runs/662065807

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Mar 17, 2021

comment:72

This worked! https://github.com/mkoeppe/sage/runs/2134463553

I'll push the fixes to this ticket.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 17, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

d0df00c.github/workflows/extract-sage-local.sh: Use sage-rebase.sh --all
da978f8.github/workflows/{extract-sage-local.sh, ci-cygwin-*.yml): Use /bin/dash for scripts invoking rebaseall

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 17, 2021

Changed commit from 2037d68 to da978f8

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Mar 17, 2021

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Mar 17, 2021

comment:74

Now testing at https://github.com/mkoeppe/sage/actions/runs/662602246

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Mar 18, 2021

comment:75

With the rebasing fixes applied, I am getting again to sage-iii (https://github.com/mkoeppe/sage/runs/2140904387?check_suite_focus=true)

  [sagelib-9.3.beta9]   Generating auto-generated sources
  [sagelib-9.3.beta9]   Building interpreters for fast_callable
  [sagelib-9.3.beta9]   -> First build of interpreters
  [sagelib-9.3.beta9]   running build_cython
  [sagelib-9.3.beta9]   Enabling Cython debugging support
  [sagelib-9.3.beta9]   /cygdrive/d/a/sage/sage/build/pkgs/sagelib/spkg-install: line 58: 61038 Illegal instruction     (core dumped) python3 -u setup.py --no-user-cfg build install

which is again coming from numpy, I think.
Help....

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Mar 19, 2021

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Mar 19, 2021

comment:76

But the current branch is already an improvement, so let's get this in...

@kliem
Copy link
Contributor

kliem commented Mar 19, 2021

comment:77

LGTM.

@kliem
Copy link
Contributor

kliem commented Mar 19, 2021

Changed reviewer from Matthias Koeppe, ..., https://github.com/mkoeppe/sage/actions/runs/662602246 to Matthias Koeppe, Jonathan Kliem

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Mar 19, 2021

comment:78

Thanks!

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Mar 19, 2021

comment:79

Follow-up in #31521

@vbraun
Copy link
Member

vbraun commented Mar 20, 2021

comment:80

Merge conflict

    STDOUT: Auto-merging tox.ini
    STDOUT: Auto-merging .github/workflows/ci-cygwin-standard.yml
    STDOUT: CONFLICT (content): Merge conflict in .github/workflows/ci-cygwin-standard.yml
    STDOUT: Auto-merging .github/workflows/ci-cygwin-minimal.yml
    STDOUT: CONFLICT (content): Merge conflict in .github/workflows/ci-cygwin-minimal.yml
    STDOUT: Automatic merge failed; fix conflicts and then commit the result.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 20, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

7081189tox.ini, build/bin/write-dockerfile.sh: Use configure --enable-download-from-upstream-url --enable-experimental-packages instead of setting SAGE_SPKG directly
d6dced2Merge #31493

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 20, 2021

Changed commit from da978f8 to d6dced2

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Mar 20, 2021

Dependencies: #31493

@vbraun
Copy link
Member

vbraun commented Mar 22, 2021

Changed branch from public/29537 to d6dced2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants