-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Multinode over internet, Async and Parallel SGD, WebAssembly #11
Comments
Thanks - yeah, I think as long as we stick to the same protobuf definition (sort of like API) this would be possible. Of course one needs to write the WebCL implementation (Andrej actually wrote one before but with custom format) - other than that it should just be a formatting issue. |
Feel free to tinker around with it, but this is honestly not a top Sent from Gmail mobile
|
@Yangqing Yes I think that browser technology it is not enough mature actually so cannot be a priority for caffe2. But I think that if WebAssembly go through covering GPU computing could be interesting to have a "compatible" design . Can I ask you what are your design targets on distribute the training and parameters update? On caffe multigpu is oriented mainly on fast p2p communication between GPU on PCI bus. Are you targeting also multi node? On which kind of network infrastructures? |
Some interesting Caffe experiments with Multicore/MultiGPU and Cluster nodes at http://arxiv.org/abs/1506.08272 |
See also BVLC/caffe#1148 (comment) |
Summary: Exposed by UBSAN: ```lang=bash caffe2/caffe2/core/qtensor.h:61:40: runtime error: load of value 190, which is not a valid value for type 'bool' #0 0x7fb4fc09c289 in caffe2::QTensor<caffe2::CPUContext>::Resize(std::vector<int, std::allocator<int> >) caffe2/caffe2/core/qtensor.h:61 #1 0x7fb4fc090403 in caffe2::QuantizedFullyConnectedOp<float, caffe2::CPUContext, caffe2::DefaultEngine>::RunOnDevice() caffe2/caffe2/fb/operators/quantized_fully_connected_op.h:93 #2 0x7fb4fc08d5ee in caffe2::Operator<caffe2::CPUContext>::Run(int) caffe2/caffe2/core/operator.h:306 #3 0x426d8a in caffe2::QFCTest(float, float, float, int, int, int, int) caffe2/caffe2/fb/operators/quantized_fully_connected_op_test.cc:78 #4 0x4295f6 in caffe2::QuantizedFullyConnectedTest_Test_Test::TestBody() caffe2/caffe2/fb/operators/quantized_fully_connected_op_test.cc:110 #5 0x7fb4eee3b6a1 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2458 #6 0x7fb4eee2cbe1 in testing::Test::Run() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2475 #7 0x7fb4eee2cd27 in testing::TestInfo::Run() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2656 #8 0x7fb4eee2ce34 in testing::TestCase::Run() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2774 #9 0x7fb4eee2eb8b in testing::internal::UnitTestImpl::RunAllTests() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:4649 #10 0x7fb4eee2ef3c in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2458 #11 0x7fb4eee2ef3c in testing::UnitTest::Run() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:4257 #12 0x7fb4fbee2ed0 in RUN_ALL_TESTS() third-party-buck/gcc-5-glibc-2.23/build/googletest/include/gtest/gtest.h:2233 #13 0x7fb4fbee2d60 in main common/gtest/LightMain.cpp:12 #14 0x7fb4e0ef7857 in __libc_start_main /home/engshare/third-party2/glibc/2.23/src/glibc-2.23/csu/../csu/libc-start.c:289 #15 0x424e08 in _start /home/engshare/third-party2/glibc/2.23/src/glibc-2.23/csu/../sysdeps/x86_64/start.S:118 UndefinedBehaviorSanitizer: invalid-bool-load caffe2/caffe2/core/qtensor.h:61:40 ``` Reviewed By: yfeldblum Differential Revision: D5898877 fbshipit-source-id: e32b1732a1946fdafaec67b3fbc072dc93bcd917
I see that your are opening here also to a distributed perspective. What do you think of some emerging solution like http://arxiv.org/abs/1503.05743? Yes this rely on WebCL that it is not supported natively in any browser and actually propose only convolutional layers in a distributed fashion. But some discussion on WebAssembly, in which also Google is involved, are starting to think on GPU support of this forming standard. See WebAssembly/design#273. Do you think that caffe2 design could be further proof to scale at the level where every node could run on a browser tab? Or don't you believe that training and networks design could evolve at large scale relying on users nodes with "standard" internet bandwidth interconnection?
The text was updated successfully, but these errors were encountered: