Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory corruption issue in 1.17.3 #142

Closed
odow opened this issue Apr 7, 2020 · 12 comments
Closed

Memory corruption issue in 1.17.3 #142

odow opened this issue Apr 7, 2020 · 12 comments

Comments

@odow
Copy link

odow commented Apr 7, 2020

Dear all,

We're struggling to track down a memory corruption issue somewhere inside the Julia wrapper of Clp (jump-dev/Clp.jl#77).

Here is a reliable way to reproduce on Ubuntu. The ccalls call the corresponding method in the C interface. If you can tell from the stacktrace, it appears to be somewhere inside gutsOfDelete. I'm guessing it's columnLower_.

The reason being, I can only reproduce this when

  • the model is unbounded (e.g., it doesn't matter whether the objective coefficient is 1.0 or -1.0
  • the variable has a lower bound < -1e20
  • I use Ubuntu (cannot reproduce this on Mac).

My guess is that somewhere in Clp, a routine is touching some memory it shouldn't.

Any ideas for how to proceed?

Oscar

julia> import Clp_jll: libClp

julia> const LB = prevfloat(-1e20)  # Crash
-1.0000000000000002e20

julia> ccall(("Clp_VersionMajor", libClp), Cint, ())
1

julia> ccall(("Clp_VersionMinor", libClp), Cint, ())
17

julia> ccall(("Clp_VersionRelease", libClp), Cint, ())
5

julia> model = ccall(("Clp_newModel", libClp), Ptr{Cvoid}, ())
Ptr{Nothing} @0x0000000002ac40a0

julia> ccall(
           ("Clp_addColumns", libClp),
           Cvoid,
           (Ptr{Cvoid}, Cint, Ptr{Float64}, Ptr{Float64}, Ptr{Float64}, Ptr{Cint}, Ptr{Cint}, Ptr{Float64}),
           model, Cint(1), Float64[LB], C_NULL, Float64[1.0], C_NULL, C_NULL, C_NULL
       )

julia> ccall(("Clp_initialSolve", libClp), Cint, (Ptr{Cvoid},), model)
Clp3002W Empty problem - 0 rows, 1 columns and 0 elements
Clp0002I Dual infeasible - objective value 0
Clp0032I DualInfeasible objective 0 - 0 iterations time 0.002
2

julia> ccall(("Clp_deleteModel", libClp), Cvoid, (Ptr{Cvoid},), model)
free(): invalid pointer

signal (6): Aborted
in expression starting at REPL[9]:1
__libc_signal_restore_set at /build/glibc-OTsEL5/glibc-2.27/signal/../sysdeps/unix/sysv/linux/nptl-signals.h:80 [inlined]
raise at /build/glibc-OTsEL5/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:48
abort at /build/glibc-OTsEL5/glibc-2.27/stdlib/abort.c:79
__libc_message at /build/glibc-OTsEL5/glibc-2.27/libio/../sysdeps/posix/libc_fatal.c:181
malloc_printerr at /build/glibc-OTsEL5/glibc-2.27/malloc/malloc.c:5350
_int_free at /build/glibc-OTsEL5/glibc-2.27/malloc/malloc.c:4157 [inlined]
__libc_free at /build/glibc-OTsEL5/glibc-2.27/malloc/malloc.c:3124
_ZN8ClpModel12gutsOfDeleteEi at /home/odow/.julia/artifacts/9cf32528460420740b8ba41eb7a2833b8975b4c2/lib/libClp.so (unknown line)
_ZN8ClpModelD1Ev at /home/odow/.julia/artifacts/9cf32528460420740b8ba41eb7a2833b8975b4c2/lib/libClp.so (unknown line)
_ZN10ClpSimplexD1Ev at /home/odow/.julia/artifacts/9cf32528460420740b8ba41eb7a2833b8975b4c2/lib/libClp.so (unknown line)
Clp_deleteModel at /home/odow/.julia/artifacts/9cf32528460420740b8ba41eb7a2833b8975b4c2/lib/libClp.so (unknown line)
top-level scope at ./REPL[9]:1
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:808
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:764
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:843
eval at ./boot.jl:330
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2135 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:86
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:118 [inlined]
#26 at ./task.jl:333
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2135 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1631 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:659
unknown function (ip: 0xffffffffffffffff)
Allocations: 4395581 (Pool: 4394552; Big: 1029); GC: 4
Aborted

Here's the platform:

julia> versioninfo()
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
@odow odow mentioned this issue Apr 7, 2020
1 task
@odow
Copy link
Author

odow commented Apr 7, 2020

This is suspicious:

double maxXz = -1e20;

Do we need to truncate data to (-1e20, 1e20)?

This doesn't appear to be the issue.

Clp/src/ClpModel.cpp

Lines 2710 to 2713 in 4688136

double value = columnLower[iColumn];
if (value < -1.0e20)
value = -COIN_DBL_MAX;
columnLower_[iColumn] = value;

@odow odow changed the title Memory corruption issue in 1.17.5 Memory corruption issue in 1.17.3 Apr 7, 2020
@odow
Copy link
Author

odow commented Apr 7, 2020

Confirmed present in 1.17.3.

@tkralphs
Copy link
Member

tkralphs commented Apr 8, 2020

Looks like it shouldn't be too difficult to create as small example that replicates this in stand-alone Cbc. That would help a lot.

@odow
Copy link
Author

odow commented Apr 8, 2020

Unless someone beats me to it, my plan for this weekend is to run a git bisect and find the issue. This is blocking us upgrading a lot of the COIN solvers for JuMP.

@tkralphs
Copy link
Member

tkralphs commented Apr 8, 2020

That would be great, but since you're just directly calling library functions from Julia, wouldn't a simple program doing the identical calls from C++ exhibit the behavior? Then we'd be able to just debug it.

@odow
Copy link
Author

odow commented Apr 8, 2020

Then we'd be able to just debug it.

That too. Either way, it's my weekend project.

@odow
Copy link
Author

odow commented Apr 11, 2020

Bisected to 832b459, but turns out it is already fixed on master: 4688136

@ViralBShah, can we carry this patch for 1.17.5?

--- a/Clp/src/ClpModel.cpp
+++ b/Clp/src/ClpModel.cpp
@@ -3786,6 +3786,12 @@ int ClpModel::emptyProblem(int * infeasNumber, double * infeasSum, bool printMes
                          if (objValue) {
                               numberDualInfeasibilities++;;
                               sumDualInfeasibilities += fabs(objValue);
+                              badColumn = i;
+                              if (objValue > 0.0) {
+                                badValue = -1.0;
+                              } else {
+                                badValue = 1.0;
+                              }
                               returnCode |= 2;
                          }
                          status_[i] = 0;

Explanation

Reproducer

#include "Clp_C_Interface.h"
#include <stdio.h>
int main(int argc, const char * argv[]) {
    double lb = {-2e20};
    double c = {1.0};
    Clp_Simplex  * model = Clp_newModel();
    Clp_addColumns(model, 1, &lb, NULL, &c, NULL, NULL, NULL);
    Clp_initialSolve(model);
    Clp_deleteModel(model);
    return 0;
}

badColumn initialized as -1

int badColumn=-1;

And we have -inf/inf bounds so instead of falling into the if,

if (columnLower_[i] > -1.0e30 || columnUpper_[i] < 1.0e30) {

we end up in this else

Clp/Clp/src/ClpModel.cpp

Lines 3784 to 3792 in 832b459

} else {
columnActivity_[i] = 0.0;
if (objValue) {
numberDualInfeasibilities++;;
sumDualInfeasibilities += fabs(objValue);
returnCode |= 2;
}
status_[i] = 0;
}

which doesn't change badColumn. So we create the ray, but badColumn is still -1 and we end up poking things out-of-bounds:

Clp/Clp/src/ClpModel.cpp

Lines 3816 to 3822 in 832b459

if (returnCode == 2) {
// create ray
delete [] ray_;
ray_ = new double [numberColumns_];
CoinZeroN(ray_, numberColumns_);
ray_[badColumn]=badValue;
}

@ViralBShah
Copy link

Yes I can carry this patch. We can build binaries from any commit :-)

But if a 1.17.6 can be tagged in the next few days, that will be preferable.

@tkralphs
Copy link
Member

We should be able to make a new release with this patched. @svigerske what do you think?

@svigerske
Copy link
Member

yeah, I can do that

@odow
Copy link
Author

odow commented Apr 11, 2020

Thanks Stefan!

Is there a document that outlines the branching structure? I'm a little confused as to the difference between stable/1.17 and master. Do you just cherry-pick commits across? What is the process/decision point for creating a new release?

@svigerske
Copy link
Member

Development happens on trunk (aka master) and bugfixes are typically going into master and are cherry-picked to a "stable/" branch. Releases are made from this branch. Some documentation (not updates for a long while, though) is at https://github.com/coin-or-tools/BuildTools/wiki/pm-svn-releases

This scheme will probably change when development moves from svn to git.

A release like this one happens when someone feels like doing so, e.g., because it has been a while and fixes have been accumulated, or someone explicitly asks for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants