Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Theano dot products lead to segfault on Azure pipelines #255

Closed
rodluger opened this issue Sep 14, 2020 · 4 comments
Closed

Theano dot products lead to segfault on Azure pipelines #255

rodluger opened this issue Sep 14, 2020 · 4 comments
Labels
bug Something isn't working

Comments

@rodluger
Copy link
Owner

rodluger commented Sep 14, 2020

This is independent of starry but is causing many of the tests (in particularly those that call map.render()) to fail.

Here is a MWE of a theano operation that segfaults on Azure, and here are the failing test results. The operation is a very simple dot product of two large-ish matrices:

import numpy as np
import theano
import theano.tensor as tt
import theano.sparse as ts
import pytest
tt.config.floatX = "float64"
sizes = [10, 100, 1000, 10000, 100000]
def dot_sum(x, y):
dot = tt.dot(x, y)
return tt.sum(dot)
x = tt.dmatrix()
y = tt.dmatrix()
func = theano.function([x, y], dot_sum(x, y))
@pytest.mark.parametrize("M", sizes)
def test_dot_product(M, N=300, L=10):
u = np.random.randn(M, N)
v = np.random.randn(N, L)
print(func(u, v))

Did the stack size change on Azure? Are we exceeding the RAM somehow? Looking into this.

@rodluger rodluger added the bug Something isn't working label Sep 14, 2020
@rodluger
Copy link
Owner Author

@twiecki Have you seen this before? It's likely an Azure RAM issue, but I was wondering if you've seen this before. The segfault happens when I try to dot a (100000, 300) matrix with a (300, 10) matrix using double precision. The issue is specific to compiled functions; calling .eval() on the node works fine.

@rodluger rodluger changed the title Dot theano products lead to segfault on Azure pipelines Theano dot products lead to segfault on Azure pipelines Sep 15, 2020
@rodluger
Copy link
Owner Author

The culprit is the latest release of openblas (0.3.10, July 2020). Reverting to version 0.3.6 fixes this issue (see tests).

@rodluger
Copy link
Owner Author

@fbartolic @dfm and anyone else that runs into this:

This is a known issue in OpenBLAS:

numpy/numpy#16913
OpenMathLib/OpenBLAS#2732

The workaround is to

conda install openblas=0.3.6

for now if you find that starry is throwing segfaults.

Fore reference, here are the test results with openblas=0.3.6 (passing) and openblas=0.3.10 (failing).

rodluger added a commit that referenced this issue Sep 15, 2020
Revert openblas to 0.3.6 as per #255
@rodluger
Copy link
Owner Author

I haven't run into this issue lately, and the dev version of the code (on branch restructure, soon to be merged to master) is currently running fine on Github Actions with openblas==0.3.13.

https://github.com/rodluger/starry/runs/1953734532?check_suite_focus=true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant