-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault after make runtest on Ubuntu 14.04/ppc64le #3539
Comments
Do you think this is the same as #3531? |
It is so interesting seeing a real instance of caffe on a ppc64el machine. Thank you for inferring that caffe should be ok to be ported to ppc64el, whatever with or without CUDA. |
Does this commit fix it for you? #3586 |
Hey SvenTwo, |
@anuphalarnkar glad that you solved it. Please add a comment to #3586 indicating that it's fixed for you, so that the PR doesn't get buried. |
Hi,
I am working on PowerPC(ppc64le) machine with Ubuntu 14.04 and 4 GPU's (all Tesla k80's)
The make runtest fails with log:
make: *** [runtest] Segmentation fault (core dumped)
I tried to run "test_all.testbin"using GDB. I am getting below output:
Program received signal SIGSEGV, Segmentation fault.
__memcpy_ppc () at ../sysdeps/powerpc/powerpc64/memcpy.S:364
364 ../sysdeps/powerpc/powerpc64/memcpy.S: No such file or directory.
I did some backtrace with gdb. Here are the results...
The GPU memories are allocated on pci bus at addresses starting with 0x3xxx xxxx xxxx as per the lshw command on linux.
Here the addresses at which functions are called are given on left hand side:
0x00003fffb35019f0 -> caffe::P2PSync::run,
0x00003fffb34fb2b0 -> caffe::DevicePair::compute,
0x00003fffb34fccbc -> std::vector<int, std::allocator >::erase
0x00003fffb30a1068 ->__GI_memmove (dest=0x153ea198, src=, len=)
In last function __GI_memmove, I feel the above destination address is an offset within the GPU memory range. For instance, the final computed address could be 0x00003fff00000000 + 0x153ea198 = 0x00003fff153ea198
However, I am unable to relate the address 0x3fff with any of the GPU cards on pci bus
I have taken a backtrace using gdb and pasted it below for your reference.
(gdb) bt
#0 __memcpy_ppc () at ../sysdeps/powerpc/powerpc64/memcpy.S:364
#1 0x00003fffb30a1068 in __GI_memmove (dest=0x153ea198, src=, len=) at ../sysdeps/powerpc/memmove.c:54
#2 0x00003fffb34fccbc in std::vector<int, std::allocator >::erase(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator > >) ()
from /root/anup/caffe/.build_release/test/../lib/libcaffe.so
#3 0x00003fffb34fb2b0 in caffe::DevicePair::compute(std::vector<int, std::allocator >, std::vector<caffe::DevicePair,
std::allocatorcaffe::DevicePair >*) ()
from /root/anup/caffe/.build_release/test/../lib/libcaffe.so
#4 0x00003fffb35019f0 in caffe::P2PSync::run(std::vector<int, std::allocator > const&) () from
/root/anup/caffe/.build_release/test/../lib/libcaffe.so
#5 0x0000000010235074 in caffe::GradientBasedSolverTestcaffe::GPUDevice::RunLeastSquaresSolver(float, float, float, int, int, int, bool, char
const*) ()
#6 0x0000000010247414 in caffe::GradientBasedSolverTestcaffe::GPUDevice::TestLeastSquaresUpdate(float, float, float, int) ()
#7 0x00000000102488bc in caffe::SGDSolverTest_TestLeastSquaresUpdate_Testcaffe::GPUDevice::TestBody() ()
#8 0x000000001053ce68 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test_, void (testing::Test::_)(), char
const*) ()
#9 0x000000001052f400 in testing::Test::Run() ()
#10 0x000000001052f53c in testing::TestInfo::Run() ()
#11 0x000000001052f724 in testing::TestCase::Run() ()
#12 0x0000000010533bc0 in testing::internal::UnitTestImpl::RunAllTests() ()
#13 0x0000000010533f60 in testing::UnitTest::Run() ()
#14 0x000000001005c038 in main ()
Any inputs will be greatly appreciated.
Thanks in advance,
Anup Halarnkar
The text was updated successfully, but these errors were encountered: