Resolve CI testing failure for Lazy Tensor Core #1088

henrytwo · 2022-07-20T21:00:31Z

xfails e2e tests for unsupported ops, which should allow CI tests to pass
removes passes tests from xfails set
included dynamic_ir.cpp to source list, which resolved free() error in CI
registered FuncDialect, which resolved MLIR error in CI
enabled e2e LTC tests for macOS and source build

antoniojkim · 2022-07-20T21:24:41Z

We're still seeing

free(): invalid pointer

which is causing a whole bunch of tests to fail. We're unable to reproduce this locally.

@silvasean @powderluv How should we proceed? What can be done to try and root cause this issue?

silvasean · 2022-07-20T21:26:23Z

We're still seeing
free(): invalid pointer
which is causing a whole bunch of tests to fail. We're unable to reproduce this locally.

@silvasean @powderluv How should we proceed? What can be done to try and root cause this issue?

Did this ever work? If so, can we bisect to find where it failed?

henrytwo · 2022-07-20T21:26:51Z

We're still seeing
free(): invalid pointer
which is causing a whole bunch of tests to fail. We're unable to reproduce this locally.
@silvasean @powderluv How should we proceed? What can be done to try and root cause this issue?
Did this ever work? If so, can we bisect to find where it failed?

We've never had it run successfully thru CI before

silvasean · 2022-07-20T21:29:34Z

Does passing -s (single process) on the CI avoid the issue? This does sound really painful to debug... thinking...

henrytwo · 2022-07-20T21:49:15Z

Does passing -s (single process) on the CI avoid the issue? This does sound really painful to debug... thinking...

erm now it dies without running the test?

Run cd $GITHUB_WORKSPACE
free(): invalid pointer
/home/runner/work/_temp/c3431cb3-81e1-42fe-8eef-24f84e[9](https://github.com/llvm/torch-mlir/runs/7438401407?check_suite_focus=true#step:8:10)91a68.sh: line 3: 17199 Aborted                 (core dumped) python -m e2e_testing.torchscript.main --config=lazy_tensor_core -v -s
Error: Process completed with exit code 134.

silvasean · 2022-07-20T21:57:10Z

Does passing -s (single process) on the CI avoid the issue? This does sound really painful to debug... thinking...

erm now it dies without running the test?

Run cd $GITHUB_WORKSPACE
free(): invalid pointer
/home/runner/work/_temp/c3431cb3-81e1-42fe-8eef-24f84e[9](https://github.com/llvm/torch-mlir/runs/7438401407?check_suite_focus=true#step:8:10)91a68.sh: line 3: 17199 Aborted                 (core dumped) python -m e2e_testing.torchscript.main --config=lazy_tensor_core -v -s
Error: Process completed with exit code 134.

Oh, that is good and expected. That means that the multiprocessing was swallowing the crash by restarting the worker process. I think from here you can run the testing process under gdb/lldb if it is in the VM image, and have it break on free and then backtrace.

This appears to be a memory corruption issue, so likely running it locally with AddressSanitizer would flag it, even if locally there are no symptoms on a normal build.

btw, have you verified that locally you are doing the same Release+Asserts build as CI? This is the command line:

cd $GITHUB_WORKSPACE
  mkdir build
  cd build
  cmake $GITHUB_WORKSPACE/externals/llvm-project/llvm -GNinja \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_LINKER=lld \
    -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
    -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ \
    -DPython[3](https://github.com/llvm/torch-mlir/runs/7438401407?check_suite_focus=true#step:4:3)_EXECUTABLE=$(which python) \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DLLVM_ENABLE_PROJECTS=mlir \
    -DLLVM_EXTERNAL_PROJECTS="torch-mlir;torch-mlir-dialects" \
    -DLLVM_EXTERNAL_TORCH_MLIR_SOURCE_DIR="$GITHUB_WORKSPACE" \
    -DLLVM_EXTERNAL_TORCH_MLIR_DIALECTS_SOURCE_DIR="${GITHUB_WORKSPACE}/external/llvm-external-projects/torch-mlir-dialects" \
    -DMLIR_ENABLE_BINDINGS_PYTHON=ON \
    -DLLVM_TARGETS_TO_BUILD=host
  ninja check-torch-mlir-all

antoniojkim · 2022-07-21T14:03:11Z

btw, have you verified that locally you are doing the same Release+Asserts build as CI?

When I try to run the same cmake command as CI, I get the following error:

torch-mlir-dialects: command not found

And when I run ninja check-torch-mlir-all using the build that we already had, it all passes without any problems

henrytwo · 2022-07-21T15:47:46Z

For those following along, we got a stack trace: https://github.com/llvm/torch-mlir/runs/7452705521?check_suite_focus=true

free(): invalid pointer
Thread 1 "python" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7a0b859 in __GI_abort () at abort.c:79
#2  0x00007ffff7a7626e in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7ba0298 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007ffff7a7e2fc in malloc_printerr (str=str@entry=0x7ffff7b9e4c1 "free(): invalid pointer") at malloc.c:5347
#4  0x00007ffff7a7fb2c in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:4173
#5  0x00007fffde851738 in c10::impl::BoxedKernelWrapper<at::Tensor (at::Tensor const&, at::Tensor const&), void>::call(void (*)(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*), c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#6  0x00007fffdeb0d841 in at::_ops::mm::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#7  0x00007fffe009907a in torch::autograd::VariableType::(anonymous namespace)::mm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#8  0x00007fffe0099eb3 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), &torch::autograd::VariableType::(anonymous namespace)::mm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#9  0x00007fffdeb514d6 in at::_ops::mm::call(at::Tensor const&, at::Tensor const&) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#10 0x00007ffff6484650 in torch::autograd::THPVariable_mm(_object*, _object*, _object*) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_python.so
#11 0x00007ffff7d6fdc2 in cfunction_call (func=func@entry=0x7fffdc44f590, args=args@entry=0x7fffcb03b080, kwargs=kwargs@entry=0x0) at Objects/methodobject.c:543
#12 0x00007ffff7d48420 in _PyObject_MakeTpCall (tstate=0x55555555c6b0, callable=0x7fffdc44f590, args=0x7fffcb31ba90, nargs=2, keywords=0x0) at Objects/call.c:191
#13 0x00007ffff7daf618 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fffcb31ba90, callable=0x7fffdc44f590, tstate=<optimized out>) at ./Include/cpython/abstract.h:116
#14 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fffcb31ba90, callable=0x7fffdc44f590, tstate=<optimized out>) at ./Include/cpython/abstract.h:103
#15 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fffcb31ba90, callable=0x7fffdc44f590) at ./Include/cpython/abstract.h:127
#16 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#17 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3489
#18 0x00007ffff7d48d33 in _PyEval_EvalFrame (throwflag=0, f=0x7fffcb31b900, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#19 function_code_fastcall (tstate=0x55555555c6b0, co=<optimized out>, args=<optimized out>, nargs=3, globals=<optimized out>) at Objects/call.c:330
#20 0x00007ffff7d4a874 in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>, tstate=<optimized out>) at ./Include/cpython/abstract.h:118
#21 method_vectorcall (method=<optimized out>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/classobject.c:61
#22 0x00007ffff7d498b3 in PyVectorcall_Call (callable=0x7fffcdfbe6c0, tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:231
#23 0x00007ffff7daed7b in do_call_core (kwdict=0x0, callargs=0x7fffcad35840, func=0x7fffcdfbe6c0, tstate=<optimized out>) at Python/ceval.c:5125
#24 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3582
#25 0x00007ffff7d48d33 in _PyEval_EvalFrame (throwflag=0, f=0x7fffcad3[94](https://github.com/llvm/torch-mlir/runs/7452705521?check_suite_focus=true#step:8:95)40, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#26 function_code_fastcall (tstate=0x55555555c6b0, co=<optimized out>, args=<optimized out>, nargs=3, globals=<optimized out>) at Objects/call.c:330
#27 0x00007ffff7daaeab in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x555558929410, callable=0x7fffce221040, tstate=0x55555555c6b0) at ./Include/cpython/abstract.h:118
#28 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x555558929410, callable=<optimized out>) at ./Include/cpython/abstract.h:127
#29 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#30 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3506
#31 0x00007ffff7d48d33 in _PyEval_EvalFrame (throwflag=0, f=0x555558929270, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#32 function_code_fastcall (tstate=0x55555555c6b0, co=<optimized out>, args=<optimized out>, nargs=2, globals=<optimized out>) at Objects/call.c:330
#33 0x00007ffff7daabd9 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fffcb597e18, callable=0x7fffd5abfa60, tstate=0x55555555c6b0) at ./Include/cpython/abstract.h:118
#34 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fffcb597e18, callable=<optimized out>) at ./Include/cpython/abstract.h:127
#35 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#36 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3520
#37 0x00007ffff7da9c60 in _PyEval_EvalFrame (throwflag=0, f=0x7fffcb597c80, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#38 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x55555863c8b0, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x7ffff724e280, name=0x7ffff7507030, qualname=0x7ffff729ff80) at Python/ceval.c:4329
#39 0x00007ffff7d48b0b in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
#40 0x00007ffff7daabd9 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x55555863c8a8, callable=0x7fffcad33ca0, tstate=0x55555555c6b0) at ./Include/cpython/abstract.h:118
#41 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x55555863c8a8, callable=<optimized out>) at ./Include/cpython/abstract.h:127
#42 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#43 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3520
#44 0x00007ffff7da9c60 in _PyEval_EvalFrame (throwflag=0, f=0x55555863c6c0, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#45 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x7fffdc2f7f28, kwcount=<optimized out>, kwstep=1, defs=0x7ffff729ef58, defcount=1, kwdefs=0x0, closure=0x0, name=0x7ffff72c4bf0, qualname=0x7ffff72c4bf0) at Python/ceval.c:4329
#46 0x00007ffff7d48b0b in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
#47 0x00007ffff7daabd9 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fffdc2f7f10, callable=0x7fffd5abfb80, tstate=0x55555555c6b0) at ./Include/cpython/abstract.h:118
#48 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fffdc2f7f10, callable=<optimized out>) at ./Include/cpython/abstract.h:127
#49 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#50 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3520
#51 0x00007ffff7da9c60 in _PyEval_EvalFrame (throwflag=0, f=0x7fffdc2f7d60, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#52 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x7ffff747fa70, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff748e770, qualname=0x7ffff748e770) at Python/ceval.c:4329
#53 0x00007ffff7d48b0b in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
#54 0x00007ffff7daabd9 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff747fa70, callable=0x7fffcad339d0, tstate=0x55555555c6b0) at ./Include/cpython/abstract.h:118
#55 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff747fa70, callable=<optimized out>) at ./Include/cpython/abstract.h:127
#56 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#57 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3520
#58 0x00007ffff7da9178 in _PyEval_EvalFrame (throwflag=0, f=0x7ffff747f900, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#59 _PyEval_EvalCode (qualname=0x0, name=<optimized out>, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwstep=2, kwcount=<optimized out>, kwargs=0x0, kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>, _co=<optimized out>, tstate=0x55555555c6b0) at Python/ceval.c:4329
#60 _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4361
#61 0x00007ffff7da8ec7 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:4377
#62 0x00007ffff7e2f71f in PyEval_EvalCode (co=co@entry=0x7ffff7266030, globals=globals@entry=0x7ffff7488280, locals=locals@entry=0x7ffff7488280) at Python/ceval.c:828
#63 0x00007ffff7e2e2b1 in builtin_exec_impl (module=<optimized out>, locals=0x7ffff7488280, globals=0x7ffff7488280, source=0x7ffff7266030) at Python/bltinmodule.c:1026
#64 builtin_exec (module=<optimized out>, args=args@entry=0x555555621f90, nargs=<optimized out>) at Python/clinic/bltinmodule.c.h:396
#65 0x00007ffff7d6fcf8 in cfunction_vectorcall_FASTCALL (func=0x7ffff74dfd60, args=0x555555621f90, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/methodobject.c:430
#66 0x00007ffff7daabd9 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x555555621f90, callable=0x7ffff74dfd60, tstate=0x55555555c6b0) at ./Include/cpython/abstract.h:118
#67 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x555555621f90, callable=<optimized out>) at ./Include/cpython/abstract.h:127
#68 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#69 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3520
#70 0x00007ffff7da9c60 in _PyEval_EvalFrame (throwflag=0, f=0x555555621dd0, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#71 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x5555555fabf0, kwcount=<optimized out>, kwstep=1, defs=0x7ffff73c03c8, defcount=5, kwdefs=0x0, closure=0x0, name=0x7ffff73becf0, qualname=0x7ffff73becf0) at Python/ceval.c:4329
#72 0x00007ffff7d48b0b in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
#73 0x00007ffff7daabd9 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x5555555fabc8, callable=0x7ffff73b4b80, tstate=0x55555555c6b0) at ./Include/cpython/abstract.h:118
#74 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x5555555fabc8, callable=<optimized out>) at ./Include/cpython/abstract.h:127
#75 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#76 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3520
#77 0x00007ffff7da9c60 in _PyEval_EvalFrame (throwflag=0, f=0x5555555faa20, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#78 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x7ffff73be768, kwcount=<optimized out>, kwstep=1, defs=0x7ffff744ac58, defcount=1, kwdefs=0x0, closure=0x0, name=0x7ffff73c0170, qualname=0x7ffff73c0170) at Python/ceval.c:4329
#79 0x00007ffff7d48b0b in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
#80 0x00007ffff7d498b3 in PyVectorcall_Call (callable=0x7ffff72d09d0, tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:231
#81 0x00007ffff7e4aed7 in pymain_run_module (modname=<optimized out>, set_argv0=<optimized out>) at Modules/main.c:291
#82 0x00007ffff7e4abde in pymain_run_python (exitcode=0x7fffffffd760) at Modules/main.c:592
#83 Py_RunMain () at Modules/main.c:677
#84 0x00007ffff7e4a6dd in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:731
#85 0x00007ffff7a0d083 in __libc_start_main (main=0x555555555060 <main>, argc=6, argv=0x7fffffffd968, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd[95](https://github.com/llvm/torch-mlir/runs/7452705521?check_suite_focus=true#step:8:96)8) at ../csu/libc-start.c:308
#86 0x000055555555509e in _start ()

silvasean · 2022-07-21T17:44:30Z

Awesome. that's a good first step. It sounds like ASan would catch this bug and give a very clear diagnosis. ASan is probably pretty hard to set up for the full Python/etc. e2e test, but a small .cpp file with just a main() and a few calls into libtorch + LTC might be doable. Let me know how your debugging goes and I can get more hands-on (vs armchair debugging this) if you folks need.

silvasean · 2022-07-21T17:45:55Z

And by Asan I mean a local ASan -- I suspect that the issue exists in the local build but is somehow not causing any symptoms locally (e.g. your local libc/malloc doesn't do as strict of checking) - asan will see it though.

powderluv · 2022-07-21T19:19:24Z

Thanks for continuing debugging this. Eventually we can add an ASAN builder (at least for the C++ parts)

henrytwo · 2022-07-21T20:54:16Z

I made a new PR for experimenting with CI so everyone doesn't get spammed with emails: #1095

Once a solution is found, I'll bring it back over here

silvasean · 2022-07-21T21:10:22Z

btw, have you verified that locally you are doing the same Release+Asserts build as CI?

When I try to run the same cmake command as CI, I get the following error:
torch-mlir-dialects: command not found
And when I run ninja check-torch-mlir-all using the build that we already had, it all passes without any problems

Is your -DLLVM_EXTERNAL_TORCH_MLIR_DIALECTS_SOURCE_DIR set correctly? Where is that error coming from?

antoniojkim · 2022-07-21T21:16:02Z

Is your -DLLVM_EXTERNAL_TORCH_MLIR_DIALECTS_SOURCE_DIR set correctly? Where is that error coming from?

Yes, it was set to

-DLLVM_EXTERNAL_TORCH_MLIR_DIALECTS_SOURCE_DIR="${GITHUB_WORKSPACE}/externals/llvm-external-projects/torch-mlir-dialects"

Honestly not sure where that error is coming from. Its not something that's encountered when running cmake via the setup.py

henrytwo · 2022-07-22T21:21:31Z

Here's some updates on the situation. It looks like we forgot to add dynamic_ir.cpp to the CMake file, so some TS backend classes were used instead of ours, which likely resulted in the free() error. Now that that's fixed, there's a new problem related to MLIR:
https://github.com/llvm/torch-mlir/runs/7475488042?check_suite_focus=true

graph(%p0 : Float(1, 5)):
  %1 : Float(1, 5) = aten::tanh(%p0)
  return (%p0, %1)
LLVM ERROR: func.func created with unregistered dialect. If this is intended, please call allowUnregisteredDialects() on the MLIRContext, or use -allow-unregistered-dialect with the MLIR tool used.
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
Thread 1 "python" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7a0b859 in __GI_abort () at abort.c:79
#2  0x00007fffcefdd710 in llvm::report_fatal_error(llvm::Twine const&, bool) () from /home/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/_mlir_libs/libTorchMLIRAggregateCAPI.so
#3  0x00007fffcf0e52ec in mlir::Operation::Operation(mlir::Location, mlir::OperationName, unsigned int, unsigned int, unsigned int, mlir::DictionaryAttr, bool) () from /home/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/_mlir_libs/libTorchMLIRAggregateCAPI.so
#4  0x00007fffcf0e4bac in mlir::Operation::create(mlir::Location, mlir::OperationName, mlir::TypeRange, mlir::ValueRange, mlir::NamedAttrList&&, mlir::BlockRange, unsigned int) () from /home/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/_mlir_libs/libTorchMLIRAggregateCAPI.so
#5  0x00007fffcf0e4892 in mlir::Operation::create(mlir::Location, mlir::OperationName, mlir::TypeRange, mlir::ValueRange, mlir::NamedAttrList&&, mlir::BlockRange, mlir::RegionRange) () from /home/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/_mlir_libs/libTorchMLIRAggregateCAPI.so
#6  0x00007fffcf0e4826 in mlir::Operation::create(mlir::OperationState const&) () from /home/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/_mlir_libs/libTorchMLIRAggregateCAPI.so
#7  0x00007fffcef12f2f in mlirOperationCreate () from /home/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/_mlir_libs/libTorchMLIRAggregateCAPI.so
#8  0x00007fffcdc9eaf9 in torch_mlir::importJitFunctionAsFuncOp(MlirContext, torch::jit::Function*, std::function<MlirAttribute (int)>, torch_mlir::ImportOptions const&) () from /home/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/_mlir_libs/_jit_ir_importer.cpython-39-x86_64-linux-gnu.so
#9  0x00007fffcdb87738 in torch::lazy::TorchMlirLoweringContext::Build() () from /home/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/lib_mlir_ltc.so
#10 0x00007fffe0f218a8 in torch::lazy::LazyGraphExecutor::Compile(std::vector<c10::intrusive_ptr<torch::lazy::LazyTensor, c10::detail::intrusive_target_default_null_type<torch::lazy::LazyTensor> >, std::allocator<c10::intrusive_ptr<torch::lazy::LazyTensor, c10::detail::intrusive_target_default_null_type<torch::lazy::LazyTensor> > > > const&, c10::ArrayRef<std::string>, torch::lazy::LazyGraphExecutor::SyncTensorCollection const&, torch::lazy::LazyGraphExecutor::PostOrderData*) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#11 0x00007fffe0f25129 in torch::lazy::LazyGraphExecutor::SyncTensorsGraphInternal(std::vector<c10::intrusive_ptr<torch::lazy::LazyTensor, c10::detail::intrusive_target_default_null_type<torch::lazy::LazyTensor> >, std::allocator<c10::intrusive_ptr<torch::lazy::LazyTensor, c10::detail::intrusive_target_default_null_type<torch::lazy::LazyTensor> > > >*, c10::ArrayRef<std::string>, torch::lazy::LazyGraphExecutor::SyncTensorsConfig const&) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#12 0x00007fffe0f25a91 in torch::lazy::LazyGraphExecutor::SyncTensorsGraph(std::vector<c10::intrusive_ptr<torch::lazy::LazyTensor, c10::detail::intrusive_target_default_null_type<torch::lazy::LazyTensor> >, std::allocator<c10::intrusive_ptr<torch::lazy::LazyTensor, c10::detail::intrusive_target_default_null_type<torch::lazy::LazyTensor> > > >*, c10::ArrayRef<std::string>, bool, bool) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#13 0x00007fffe0f2604a in torch::lazy::LazyGraphExecutor::SyncLiveTensorsGraph(torch::lazy::BackendDevice const*, c10::ArrayRef<std::string>, bool) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#14 0x00007ffff69b3057 in pybind11::cpp_function::initialize<torch::lazy::initLazyBindings(_object*)::{lambda(std::string const&, std::vector<std::string, std::allocator<std::string> > const&, bool)#1}, void, std::string const&, std::vector<std::string, std::allocator<std::string> > const&, bool, pybind11::name, pybind11::scope, pybind11::sibling, pybind11::arg_v, pybind11::arg, pybind11::arg_v>(torch::lazy::initLazyBindings(_object*)::{lambda(std::string const&, std::vector<std::string, std::allocator<std::string> > const&, bool)#1}&&, void (*)(std::string const&, std::vector<std::string, std::allocator<std::string> > const&, bool), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, pybind11::arg_v const&, pybind11::arg const&, pybind11::arg_v const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_python.so
#15 0x00007ffff635818d in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_python.so
#16 0x00007ffff7d6fdc2 in cfunction_call (func=0x7fffdc30df90, args=<optimized out>, kwargs=<optimized out>) at Objects/methodobject.c:543
#17 0x00007ffff7d484b8 in _PyObject_MakeTpCall (tstate=0x55555555c6b0, callable=0x7fffdc30df90, args=0x7fffd4bf8580, nargs=<optimized out>, keywords=<optimized out>) at Objects/call.c:191
#18 0x00007ffff7daf7fa in _PyObject_VectorcallTstate (kwnames=0x7ffff744a580, nargsf=<optimized out>, args=0x7fffd4bf8580, callable=0x7fffdc30df90, tstate=<optimized out>) at ./Include/cpython/abstract.h:116
#19 _PyObject_VectorcallTstate (kwnames=0x7ffff744a580, nargsf=<optimized out>, args=0x7fffd4bf8580, callable=0x7fffdc30df90, tstate=<optimized out>) at ./Include/cpython/abstract.h:103
#20 PyObject_Vectorcall (kwnames=0x7ffff744a580, nargsf=<optimized out>, args=<optimized out>, callable=0x7fffdc30df90) at ./Include/cpython/abstract.h:127
#21 call_function (kwnames=0x7ffff744a580, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=<optimized out>) at Python/ceval.c:5077
#22 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3537
#23 0x00007ffff7d48d33 in _PyEval_EvalFrame (throwflag=0, f=0x7fffd4bf8400, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#24 function_code_fastcall (tstate=0x55555555c6b0, co=<optimized out>, args=<optimized out>, nargs=2, globals=<optimized out>) at Objects/call.c:330
#25 0x00007ffff7daf10e in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x5555555b9c48, callable=0x7fffd488ea60, tstate=0x55555555c6b0) at ./Include/cpython/abstract.h:118
#26 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x5555555b9c48, callable=<optimized out>) at ./Include/cpython/abstract.h:127
#27 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#28 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3489
#29 0x00007ffff7da9178 in _PyEval_EvalFrame (throwflag=0, f=0x5555555b9ad0, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#30 _PyEval_EvalCode (qualname=0x0, name=<optimized out>, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwstep=2, kwcount=<optimized out>, kwargs=0x0, kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>, _co=<optimized out>, tstate=0x55555555c6b0) at Python/ceval.c:4329
#31 _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4361
#32 0x00007ffff7da8ec7 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:4377
#33 0x00007ffff7e2f71f in PyEval_EvalCode (co=co@entry=0x7ffff73b3190, globals=globals@entry=0x7ffff74881c0, locals=locals@entry=0x7ffff74881c0) at Python/ceval.c:828
#34 0x00007ffff7e4230d in run_eval_code_obj (tstate=0x55555555c6b0, co=0x7ffff73b3190, globals=0x7ffff74881c0, locals=0x7ffff74881c0) at Python/pythonrun.c:1221
#35 0x00007ffff7e4229b in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7ffff74881c0, locals=0x7ffff74881c0, flags=<optimized out>, arena=<optimized out>) at Python/pythonrun.c:1242
#36 0x00007ffff7ce7338 in pyrun_file (fp=fp@entry=0x555555559340, filename=filename@entry=0x7ffff73ad630, start=start@entry=257, globals=globals@entry=0x7ffff74881c0, locals=locals@entry=0x7ffff74881c0, closeit=closeit@entry=1, flags=0x7fffffffd788) at Python/pythonrun.c:1140
#37 0x00007ffff7ce70c4 in pyrun_simple_file (flags=0x7fffffffd788, closeit=1, filename=0x7ffff73ad630, fp=0x555555559340) at Python/pythonrun.c:450
#38 PyRun_SimpleFileExFlags (fp=fp@entry=0x555555559340, filename=<optimized out>, closeit=closeit@entry=1, flags=flags@entry=0x7fffffffd788) at Python/pythonrun.c:483
#39 0x00007ffff7ce7fb3 in PyRun_AnyFileExFlags (fp=fp@entry=0x555555559340, filename=<optimized out>, closeit=closeit@entry=1, flags=flags@entry=0x7fffffffd788) at Python/pythonrun.c:92
#40 0x00007ffff7e4ab61 in pymain_run_file (cf=0x7fffffffd788, config=0x55555555cf50) at Modules/main.c:373
#41 pymain_run_python (exitcode=0x7fffffffd780) at Modules/main.c:598
#42 Py_RunMain () at Modules/main.c:677
#43 0x00007ffff7e4a6dd in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:731
#44 0x00007ffff7a0d083 in __libc_start_main (main=0x555555555060 <main>, argc=2, argv=0x7fffffffd988, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd[97](https://github.com/llvm/torch-mlir/runs/7475488042?check_suite_focus=true#step:5:98)8) at ../csu/libc-start.c:308
#45 0x000055555555509e in _start ()

The behaviour is the same when running e2e tests, but in this case I'm running a small example model to help with isolating the source.

silvasean · 2022-07-25T20:13:59Z

You probably need to call torchMlirRegisterAllDialects:

torch-mlir/python/torch_mlir/dialects/torch/importer/jit_ir/csrc/module_builder.cpp

Line 118 in e23fbc8

torchMlirRegisterAllDialects(context);

henrytwo · 2022-07-25T20:16:00Z

You probably need to call torchMlirRegisterAllDialects:

torch-mlir/python/torch_mlir/dialects/torch/importer/jit_ir/csrc/module_builder.cpp

Line 118 in e23fbc8

torchMlirRegisterAllDialects(context);

Hmm the strange thing is that it is called: https://github.com/llvm/torch-mlir/blob/torch_mlir_ltc_backend/python/torch_mlir/csrc/base_lazy_backend/mlir_lowering_context.cpp#L279

silvasean · 2022-07-25T20:22:47Z

I think we might also need the equivalent of torchMlirRegisterRequiredDialects: https://github.com/llvm/torch-mlir/pull/1084/files

@ashay I recall that patch got reverted -- is it okay for henry to just copy that function for now?

henrytwo · 2022-07-25T20:28:06Z

Oh now that I think of it, mlir::func::FuncDialect is probably the only dialect I need to register. I'll try registering it directly and see what happens

lib/InitAll.cpp

henrytwo · 2022-07-25T21:20:45Z

From my testing on another branch, this PR should enable e2e CI tests to run successfully; however, we still run into some issues during Build out-of-tree

silvasean · 2022-07-25T21:58:17Z

From my testing on another branch, this PR should enable e2e CI tests to run successfully; however, we still run into some issues during Build out-of-tree

we were talking today about the possibility of removing the out of tree build at the developer hour -- @powderluv, this sounds like another data point.

henrytwo · 2022-07-25T22:04:10Z

@silvasean assuming the other 2 tests pass, are there any other changes you want on this PR, or would we be good to merge into torch_mlir_ltc_backend?

silvasean · 2022-07-25T22:53:23Z

@silvasean assuming the other 2 tests pass, are there any other changes you want on this PR, or would we be good to merge into torch_mlir_ltc_backend?

LGTM

silvasean · 2022-07-25T22:55:53Z

@silvasean assuming the other 2 tests pass, are there any other changes you want on this PR, or would we be good to merge into torch_mlir_ltc_backend?

And btw, thanks for debugging this. I've been on the other end of these "debug iteration goes through GitHub Actions" things and it is incredibly painful. I can't imagine debugging a memory corruption issue like you did here!

ashay · 2022-07-25T23:18:23Z

I recall that patch got reverted -- is it okay for henry to just copy that function for now?

Sorry for being late to the discussion. Henry, feel free to copy that function. Now that macOS builds run in CI, I should be able to fix any redundancy in a subsequent patch.

henrytwo · 2022-07-25T23:40:11Z

I'm going to merge this into the LTC branch now. The failures during source and macos build seem to be outside the scope of this PR, so they'll be addressed separately.

* Xfail unsupported ops * Register FuncDialect * Include dynamic_ir in build * Code reformat * Enable LTC tests for macOS and Source Build

Signed-off-by: Gong Su <gong_su@hotmail.com> Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>

* Add check-onnx-backend to Mac CI. (llvm#1069) * Add check-onnx-backend to Mac CI. Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Additional Docker help and split README for easier reading (llvm#1084) * initial docker documentation Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * split README with no redundant place for info Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * update Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * update Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * update Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * update Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * update Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * respond to suggestions Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * specify that onnx-mlir.py script generates only code suitable to be exec in Linux and/or Docker env Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * fix checkdocs Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * responded to review suggestion on onnx-mlir --help Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * use ONNX-MLIR everywhere Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * add verify for concat Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * check all inputs Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Support filtering out lit tests based on targets (llvm#1087) Currently we ignore what targets llvm was built for in the lit tests, but recent changes to onnx-mlir explicitly initialize the available targets. This makes the corresponding change to the lit configuration, so that we can filter out the lit tests based on the available targets. Signed-off-by: Stella Stamenova <stilis@microsoft.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Switch URLs to use main instead of master (llvm#1094) Signed-off-by: Charles Volzka <cjvolzka@us.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Fix MacOS build badge (llvm#1092) Signed-off-by: Gong Su <gong_su@hotmail.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * onnx-mlir.py warning about binary output (.so and .jar) (llvm#1090) not directly usable if host is not Linux Signed-off-by: Gong Su <gong_su@hotmail.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Make the doc example obey ONNX_MLIR_BUILD_TESTS (llvm#1083) * Make the doc example obey ONNX_MLIR_BUILD_TESTS Currently, ONNX_MLIR_BUILD_TESTS controls EXCLUDE_FROM_ALL, however, the targets added through add_executable will always build. We follow the llvm pattern and explicitly set EXCLUDE_FROM_ALL in the add_onnx_mlir_executable function if it is set for the directory, so that add_executable targets don't always build. Signed-off-by: Stella Stamenova <stilis@microsoft.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Explicitly install into lib on all systems (llvm#1088) Signed-off-by: Gong Su <gong_su@hotmail.com> Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * add check (llvm#1098) Signed-off-by: Tong Chen <chentong@us.ibm.com> Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * fix typos and add ssh-client to dockerfile (llvm#1096) * fix typos and add ssh-client to dockerfile Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * sync doc and script Signed-off-by: Ethan Wang <ywan2928@uwo.ca> Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Emit print statement only when the verbose option is in effect. (llvm#1097) Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com> Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * format & refine code by request Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Support older versions 6, 11, 12 for Clip Op (llvm#1100) Signed-off-by: Tung D. Le <tung@jp.ibm.com> Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * using front to get first input Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * add 3 lit test for concat verifier Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * add newline Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Add check-onnx-backend to Mac CI. (llvm#1069) * Add check-onnx-backend to Mac CI. Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Additional Docker help and split README for easier reading (llvm#1084) * initial docker documentation Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * split README with no redundant place for info Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * update Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * update Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * update Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * update Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * update Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * respond to suggestions Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * specify that onnx-mlir.py script generates only code suitable to be exec in Linux and/or Docker env Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * fix checkdocs Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * responded to review suggestion on onnx-mlir --help Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> * use ONNX-MLIR everywhere Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Switch URLs to use main instead of master (llvm#1094) Signed-off-by: Charles Volzka <cjvolzka@us.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Fix MacOS build badge (llvm#1092) Signed-off-by: Gong Su <gong_su@hotmail.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * fix typos and add ssh-client to dockerfile (llvm#1096) * fix typos and add ssh-client to dockerfile Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * sync doc and script Signed-off-by: Ethan Wang <ywan2928@uwo.ca> Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Update document (llvm#1077) * create Signed-off-by: Tong Chen <chentong@us.ibm.com> * delete HowTOAddAnOperation.md Signed-off-by: Tong Chen <chentong@us.ibm.com> * modify testing Signed-off-by: Tong Chen <chentong@us.ibm.com> * create Signed-off-by: Tong Chen <chentong@us.ibm.com> * delete HowTOAddAnOperation.md Signed-off-by: Tong Chen <chentong@us.ibm.com> * modify testing Signed-off-by: Tong Chen <chentong@us.ibm.com> * fix Signed-off-by: Tong Chen <chentong@us.ibm.com> * create Signed-off-by: Tong Chen <chentong@us.ibm.com> * add comment Signed-off-by: Tong Chen <chentong@us.ibm.com> * delete HowTOAddAnOperation.md Signed-off-by: Tong Chen <chentong@us.ibm.com> * modify testing Signed-off-by: Tong Chen <chentong@us.ibm.com> * fix Signed-off-by: Tong Chen <chentong@us.ibm.com> * create Signed-off-by: Tong Chen <chentong@us.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Update LLVM level (llvm#1095) * Update LLVM level to 700997aef8c1f2f08c9ac5fca61650b57a01e8b1 Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Pass a type converter to all ONNX operations. (llvm#1102) Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Nuke KrnlDummyCastOp now that we use MLIR's UnrealizedConversionCastOp (llvm#1103) * Nuke KrnlDummyCastOp now that we use MLIR's UnrealizedConversionCastOp Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com> * Remove a dependency in src/Dialect/Krnl/CMakeList.txt. Regenerate docs via 'ninja onnx-mlir-docs'. Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Add --emitObj option to onnx-mlir (llvm#1104) Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * fix warnings (llvm#1093) Signed-off-by: Ian Bearman <ianb@microsoft.com> Co-authored-by: Stella Stamenova <stilis@microsoft.com> Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Add -march option to onnx-mlir (llvm#1107) Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Fix Doc spelling and broken links, removed warnings about using main (llvm#1106) * removed warning about main vs master in CONTRIBUTING, fixed links and spelling mistakes Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com> Signed-off-by: Ethan Wang <ywan2928@uwo.ca> * Update BuildONNX.md Signed-off-by: Ethan Wang <ywan2928@uwo.ca> Co-authored-by: Ettore Tiotto <etiotto@ca.ibm.com> Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com> Co-authored-by: Stella Stamenova <stilis@microsoft.com> Co-authored-by: Charles Volzka <42243335+cjvolzka@users.noreply.github.com> Co-authored-by: gongsu832 <gong_su@hotmail.com> Co-authored-by: chentong319 <chentong@us.ibm.com> Co-authored-by: Tung D. Le <tung@jp.ibm.com> Co-authored-by: Ian Bearman <ian.bearman@live.com>

henrytwo requested review from powderluv, ramiro050 and silvasean July 20, 2022 21:00

henrytwo self-assigned this Jul 20, 2022

silvasean approved these changes Jul 20, 2022

View reviewed changes

henrytwo changed the title ~~XFail unsupported ops~~ Resolve CI testing failure for Lazy Tensor Core Jul 21, 2022

henrytwo force-pushed the henrytu/xfail_unsupported_ops branch from ca6d89e to 29922a2 Compare July 21, 2022 20:49

antoniojkim force-pushed the torch_mlir_ltc_backend branch from 6698bb0 to 005748b Compare July 22, 2022 13:41

henrytwo force-pushed the henrytu/xfail_unsupported_ops branch from 29922a2 to 23e9bb8 Compare July 22, 2022 18:37

henrytwo force-pushed the henrytu/xfail_unsupported_ops branch from 23e9bb8 to c5e8448 Compare July 25, 2022 19:23

Xfail unsupported ops

5b5cd88

henrytwo added 3 commits July 25, 2022 17:11

Register FuncDialect

3913585

Include dynamic_ir in build

6d21b8d

Code reformat

bfe533f

henrytwo force-pushed the henrytu/xfail_unsupported_ops branch from c5e8448 to bfe533f Compare July 25, 2022 21:18

henrytwo commented Jul 25, 2022

View reviewed changes

lib/InitAll.cpp Show resolved Hide resolved

henrytwo requested a review from silvasean July 25, 2022 21:19

Enable LTC tests for macOS and Source Build

906d354

henrytwo merged commit 4106a7d into torch_mlir_ltc_backend Jul 25, 2022

henrytwo deleted the henrytu/xfail_unsupported_ops branch July 26, 2022 00:16

henrytwo added a commit that referenced this pull request Jul 29, 2022

Resolve CI testing failure for Lazy Tensor Core (#1088)

a876601

* Xfail unsupported ops * Register FuncDialect * Include dynamic_ir in build * Code reformat * Enable LTC tests for macOS and Source Build

henrytwo added a commit that referenced this pull request Jul 29, 2022

Resolve CI testing failure for Lazy Tensor Core (#1088)

9a82a69

* Xfail unsupported ops * Register FuncDialect * Include dynamic_ir in build * Code reformat * Enable LTC tests for macOS and Source Build

henrytwo added a commit that referenced this pull request Jul 30, 2022

Resolve CI testing failure for Lazy Tensor Core (#1088)

70395de

* Xfail unsupported ops * Register FuncDialect * Include dynamic_ir in build * Code reformat * Enable LTC tests for macOS and Source Build

tanyokwok mentioned this pull request Sep 21, 2022

features/bladedisc rebase 20220830 pai-disc/torch-mlir#20

Closed

qedawkins pushed a commit to nod-ai/torch-mlir that referenced this pull request Oct 3, 2022

Explicitly install into lib on all systems (llvm#1088)

f6f127c

Signed-off-by: Gong Su <gong_su@hotmail.com> Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve CI testing failure for Lazy Tensor Core #1088

Resolve CI testing failure for Lazy Tensor Core #1088

henrytwo commented Jul 20, 2022 •

edited

Loading

antoniojkim commented Jul 20, 2022

silvasean commented Jul 20, 2022

henrytwo commented Jul 20, 2022

silvasean commented Jul 20, 2022

henrytwo commented Jul 20, 2022

silvasean commented Jul 20, 2022

antoniojkim commented Jul 21, 2022

henrytwo commented Jul 21, 2022

silvasean commented Jul 21, 2022

silvasean commented Jul 21, 2022 •

edited

Loading

powderluv commented Jul 21, 2022

henrytwo commented Jul 21, 2022

silvasean commented Jul 21, 2022

antoniojkim commented Jul 21, 2022

henrytwo commented Jul 22, 2022

silvasean commented Jul 25, 2022

henrytwo commented Jul 25, 2022

silvasean commented Jul 25, 2022

henrytwo commented Jul 25, 2022

henrytwo commented Jul 25, 2022

silvasean commented Jul 25, 2022

henrytwo commented Jul 25, 2022

silvasean commented Jul 25, 2022

silvasean commented Jul 25, 2022

ashay commented Jul 25, 2022

henrytwo commented Jul 25, 2022

Resolve CI testing failure for Lazy Tensor Core #1088

Resolve CI testing failure for Lazy Tensor Core #1088

Conversation

henrytwo commented Jul 20, 2022 • edited Loading

antoniojkim commented Jul 20, 2022

silvasean commented Jul 20, 2022

henrytwo commented Jul 20, 2022

silvasean commented Jul 20, 2022

henrytwo commented Jul 20, 2022

silvasean commented Jul 20, 2022

antoniojkim commented Jul 21, 2022

henrytwo commented Jul 21, 2022

silvasean commented Jul 21, 2022

silvasean commented Jul 21, 2022 • edited Loading

powderluv commented Jul 21, 2022

henrytwo commented Jul 21, 2022

silvasean commented Jul 21, 2022

antoniojkim commented Jul 21, 2022

henrytwo commented Jul 22, 2022

silvasean commented Jul 25, 2022

henrytwo commented Jul 25, 2022

silvasean commented Jul 25, 2022

henrytwo commented Jul 25, 2022

henrytwo commented Jul 25, 2022

silvasean commented Jul 25, 2022

henrytwo commented Jul 25, 2022

silvasean commented Jul 25, 2022

silvasean commented Jul 25, 2022

ashay commented Jul 25, 2022

henrytwo commented Jul 25, 2022

henrytwo commented Jul 20, 2022 •

edited

Loading

silvasean commented Jul 21, 2022 •

edited

Loading