Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault with nested functions (?) or on return #9222

Closed
jgoldfar opened this issue Dec 2, 2014 · 19 comments
Closed

segfault with nested functions (?) or on return #9222

jgoldfar opened this issue Dec 2, 2014 · 19 comments
Assignees

Comments

@jgoldfar
Copy link
Contributor

jgoldfar commented Dec 2, 2014

I am using Julia for an application in PDE-constrained optimization, and have run into an issue I haven't seen before, giving a segfault when trying to return (apparently) from a function call. I only mention the nested functions because in the optimization setup, I construct functionals using passed data, and then pass those functionals to my optimization routine. Not sure if that is creating some issue behind the scenes.

Based on the optimization trace, all of the iterations in the solver are working correctly, but then something breaks. The embedded PDE solver and optimization routine have passing unit tests, so I'm (pretty) sure the problem is not with their separate functionality, but somehow at the interface between them.

The gist with the backtrace is here https://gist.github.com/jgoldfar/b311086329af529fff42. The same issue appeared in version 0.3.0 I've been using for a while, but I figured I'd update to check if it resolved somehow. I'd be happy to look more into what is causing the issue on my own if someone could point me in the right direction!

@JeffBezanson
Copy link
Sponsor Member

Are any calls to C code involved?

To make progress, we will probably need to be able to see the code and reproduce the crash. If you can't post it publicly, you can email it to me so I can debug.

@jgoldfar
Copy link
Contributor Author

jgoldfar commented Dec 2, 2014

I can for sure answer no to the first query; I'm not calling any C myself. So far I'm not even using packages for core functionality, so I don't think that would be the issue.

I'll look into whether I can open the repository or produce a MWE. Thanks for your help

@jgoldfar
Copy link
Contributor Author

jgoldfar commented Dec 3, 2014

The segfault has no dependency on the context other than apparently the composite types I have defined for parameters to PDE/optimization solvers and how their arguments are constructed. To make examples, I include parameter functions from a Julia file and reference those functions to create a composite parameter type; these parameters are passed around and apparently "setting" them back into the calling scope is causing the issue. I've narrowed down the issue to immediately before returning from the outermost call.

Unfortunately (since I wanted to implement a special case) there is a good amount of duplication between my code and Optim.jl. I will likely use something like that down the line; my point in remarking here is that I am close to being able to put up a minimal working example, but since I don't know how to narrow it down anymore it will possibly be a bit verbose. Thanks for your patience.

@jgoldfar
Copy link
Contributor Author

jgoldfar commented Dec 4, 2014

I've reduced the codes to just what is necessary to reproduce the issue. The error is now more verbose, and the updated backtrace is in the same gist at https://gist.github.com/jgoldfar/b311086329af529fff42

The MWE repo is at https://github.com/jgoldfar/isp-julia-mwe

By the way, I'm not sure I'm writing these codes in the most "Julian" way... The reason why it is structured this way is mostly due to how I reason about the underlying problem. I would like to keep the example parameter functions separated from the solvers, and be able to retain the same PDE data between solver runs. If there's a better way to achieve these goals, let me know!

Thanks again

@jgoldfar
Copy link
Contributor Author

jgoldfar commented Dec 4, 2014

Since I have to make progress on this problem, I've figured out where the error is (but not what it is.) Somehow, Julia was not happy with how the parameter functions were being constructed. Before, the example solutions were being taken in by my "ExData" object, manipulated (composed into other functions) in the creation of the "SolvData" object, and then passed into my functional generator. Somehow Julia was losing track of where these functions are.

Doing the manipulation by hand and plugging into a version of the functional generator taking all of these parameter functions runs fine. Even before, the error would happen after the functional had finished defining everything it had to, somewhere after the "return." Hope that helps narrow down the issue...

@vchuravy
Copy link
Member

vchuravy commented Dec 4, 2014

If I replace https://github.com/jgoldfar/isp-julia-mwe/blob/67ca0b1f629aab127f9529832af276316a6bd5a8/LocalExperiment.jl#L26 with

 T1 = eltype(v0)
  func=ISP.SimpleTest(sd, mu_actual, nu_actual, v0, zero(T1), [one(T1), one(T1)], delta, l, R, v0[1], true)

with should be the correct call to the constructor I do not get any errors any more. So I suppose the problem might be in https://github.com/jgoldfar/isp-julia-mwe/blob/67ca0b1f629aab127f9529832af276316a6bd5a8/src/Functionals.jl#L10-L18

Also doing include(joinpath(dirname(@__FILE__), "globSettings.jl")) should be not really be necessary, since include is uses relative paths http://docs.julialang.org/en/release-0.3/stdlib/base/?highlight=include#Base.include

Hope that helps and it might be best to discuss this issue further on the julia-user mailing list.

Best Valentin

@jgoldfar
Copy link
Contributor Author

jgoldfar commented Dec 4, 2014

Thanks for the fix! I guess I did expect a segfault from what looks like a call without the correct type, based on your fix (?) Thanks also for the advice; since I wanted to "dive in" as it were, the lack of relative path usage is among the least of my worries, when it comes to misunderstanding. I'm sure they're that way because some time back something broke and that (seemed to) fix it.
Regards, Max

@jgoldfar jgoldfar closed this as completed Dec 4, 2014
@JeffBezanson JeffBezanson reopened this Dec 4, 2014
@JeffBezanson
Copy link
Sponsor Member

I might be missing something, but it's generally never OK for julia to segfault. Any insight into why there was a segfault instead of an error message?

@jgoldfar
Copy link
Contributor Author

jgoldfar commented Dec 4, 2014

Sorry about that, my mistake. Not much more illuminating to me, but I noticed that replacing the absolute paths with relative paths in the LocalExperiment file (see this gist) no longer segfaults, but it does give an error,

ERROR: type: doISP: in apply, expected Function, got Function
 in doISP at /Users/jgoldfar/Documents/research/optcont/JCode/LocalExperiment.jl:6
 in include at /Applications/LightTable/Julia.app/Contents/Resources/julia/lib/julia/sys.dylib
 in include_from_node1 at loading.jl:128
 in process_options at /Applications/LightTable/Julia.app/Contents/Resources/julia/lib/julia/sys.dylib
 in _start at /Applications/LightTable/Julia.app/Contents/Resources/julia/lib/julia/sys.dylib (repeats 2 times)
while loading /Users/jgoldfar/Documents/research/optcont/JCode/LocalExperiment.jl, in expression starting on line 29

@vchuravy
Copy link
Member

vchuravy commented Dec 4, 2014

The weird thing that is happening for me is if I comment out the last line of doISP in the gist posted above from @jgoldfar the error disappears (and the segfaults). If I then move the line include("ex3.jl") out of the function I get the following error:

ERROR: access to undefined reference
 in doISP at /home/wallnuss/src/experiments_in_julia/isp-julia-mwe/LocalExperiment.jl:11
 in include_3B_211 at /usr/bin/../lib/julia/sys.so
 in include_from_node1 at loading.jl:128
 in process_options_3B_1730 at /usr/bin/../lib/julia/sys.so
 in _start_3B_1717 at /usr/bin/../lib/julia/sys.so (repeats 2 times)
while loading /home/wallnuss/src/experiments_in_julia/isp-julia-mwe/LocalExperiment.jl, in expression starting on line 29

And the same thing here without the last line that code runs without a problem.

And there is no difference in @code_typed and @code_lowered between commenting the last line out except for the last line.

@JeffBezanson
Copy link
Sponsor Member

Test reduction is one of my hobbies. This simple file still segfaults for me:

SimpleTest{T1<:Real}(pdedata, mu_actual::Vector{T1},
                     nu_actual::Vector{T1}, v0::Vector{T1},
                     epsilon::T1, beta::Vector{T1},
                     delta::T1, l::T1, R::T1, s0::T1,
                     show_trace::Bool = true) = 0.0

SimpleTest{T1<:Real}(pdedata, mu_actual::Vector{T1},
                     nu_actual::Vector{T1}, v0::Vector{T1},
                     epsilon::T1, beta::Vector{T1},
                     delta::T1, l::T1, R::T1) =
                         SimpleTest(pdedata, mu_actual, nu_actual, v0, epsilon, beta, delta, l, R, v0[1])

function foo()
    v0 = rand(10)
    mu_actual = rand(10)
    nu_actual = rand(10)
    SimpleTest(0.0, mu_actual, nu_actual, v0, 0.0, [1.0,1.0], 0.5, 5.0, 20.0)
end

foo()

Also segfaults with 0.3. Disabling LLVM optimization passes makes the segfault go away (FPM->run(*f) in codegen.cpp).

@JeffBezanson JeffBezanson added the bug Indicates an unexpected problem or unintended behavior label Dec 5, 2014
@JeffBezanson
Copy link
Sponsor Member

@Keno @vtjnash Might want to try with llvm-3.5 or llvm-svn?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Dec 5, 2014

I think this is fixed in llvm-3.5.0

@hayd
Copy link
Member

hayd commented Jan 20, 2016

Jeff's example segfaults in the repl for me on master with 3.7.1.

@tkelman
Copy link
Contributor

tkelman commented Jan 20, 2016

what platform? we're not testing osx on travis right now because it's too slow, so hard to say whether #14687 would have triggered whatever you're seeing

@yuyichao
Copy link
Contributor

Which platform is this and what's the error message?

@hayd
Copy link
Member

hayd commented Jan 20, 2016

This is on OSX...

julia> versioninfo()
Julia Version 0.5.0-dev+2172
Commit 3309d89 (2016-01-17 05:12 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1

julia> # paste in Jeff's example from above
[1]    3422 segmentation fault  ~/projects/julia/julia

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 20, 2016

this might be another instance of #9770, but i suspect it may actually be type-inference limit-tuple-type that is misbehaving. I think i just ran into a more likely cause of this bug a few days ago.

@vtjnash vtjnash self-assigned this Jan 20, 2016
@tkelman tkelman removed the needs tests Unit tests are required for this change label Jan 21, 2016
@vtjnash vtjnash added needs tests Unit tests are required for this change and removed bug Indicates an unexpected problem or unintended behavior labels Mar 25, 2016
@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 25, 2016

fixed by #15300, this would be a good case to add as an additional inference test before closing

@tkelman tkelman removed the needs tests Unit tests are required for this change label Aug 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants