Skip to content
This repository has been archived by the owner on Feb 7, 2023. It is now read-only.

Multinode over internet, Async and Parallel SGD, WebAssembly #11

Closed
bhack opened this issue Aug 30, 2015 · 5 comments
Closed

Multinode over internet, Async and Parallel SGD, WebAssembly #11

bhack opened this issue Aug 30, 2015 · 5 comments

Comments

@bhack
Copy link

bhack commented Aug 30, 2015

I see that your are opening here also to a distributed perspective. What do you think of some emerging solution like http://arxiv.org/abs/1503.05743? Yes this rely on WebCL that it is not supported natively in any browser and actually propose only convolutional layers in a distributed fashion. But some discussion on WebAssembly, in which also Google is involved, are starting to think on GPU support of this forming standard. See WebAssembly/design#273. Do you think that caffe2 design could be further proof to scale at the level where every node could run on a browser tab? Or don't you believe that training and networks design could evolve at large scale relying on users nodes with "standard" internet bandwidth interconnection?

@Yangqing
Copy link
Contributor

Thanks - yeah, I think as long as we stick to the same protobuf definition (sort of like API) this would be possible. Of course one needs to write the WebCL implementation (Andrej actually wrote one before but with custom format) - other than that it should just be a formatting issue.

@Yangqing
Copy link
Contributor

Feel free to tinker around with it, but this is honestly not a top
priority...

Sent from Gmail mobile
On Aug 30, 2015 9:29 AM, "bhack" notifications@github.com wrote:

How do you think that a JavaScript back end could be introduced in the
design? I.e. starting from the JavaScript matrix component
https://github.com/mil-tokyo/sushi


Reply to this email directly or view it on GitHub
https://github.com/Yangqing/caffe2/issues/11#issuecomment-136157348.

@bhack
Copy link
Author

bhack commented Aug 30, 2015

@Yangqing Yes I think that browser technology it is not enough mature actually so cannot be a priority for caffe2. But I think that if WebAssembly go through covering GPU computing could be interesting to have a "compatible" design . Can I ask you what are your design targets on distribute the training and parameters update? On caffe multigpu is oriented mainly on fast p2p communication between GPU on PCI bus. Are you targeting also multi node? On which kind of network infrastructures?

@bhack
Copy link
Author

bhack commented Sep 5, 2015

Some interesting Caffe experiments with Multicore/MultiGPU and Cluster nodes at http://arxiv.org/abs/1506.08272

@bhack
Copy link
Author

bhack commented Sep 6, 2015

See also BVLC/caffe#1148 (comment)

@bhack bhack changed the title Network, WebAssembly and Browser Multinode over internet, Async and Parallel SGD, WebAssembly Sep 6, 2015
@bhack bhack closed this as completed Feb 1, 2016
bwasti added a commit that referenced this issue Jan 5, 2017
facebook-github-bot pushed a commit that referenced this issue Sep 25, 2017
Summary:
Exposed by UBSAN:
```lang=bash
caffe2/caffe2/core/qtensor.h:61:40: runtime error: load of value 190, which is not a valid value for type 'bool'
    #0 0x7fb4fc09c289 in caffe2::QTensor<caffe2::CPUContext>::Resize(std::vector<int, std::allocator<int> >) caffe2/caffe2/core/qtensor.h:61
    #1 0x7fb4fc090403 in caffe2::QuantizedFullyConnectedOp<float, caffe2::CPUContext, caffe2::DefaultEngine>::RunOnDevice() caffe2/caffe2/fb/operators/quantized_fully_connected_op.h:93
    #2 0x7fb4fc08d5ee in caffe2::Operator<caffe2::CPUContext>::Run(int) caffe2/caffe2/core/operator.h:306
    #3 0x426d8a in caffe2::QFCTest(float, float, float, int, int, int, int) caffe2/caffe2/fb/operators/quantized_fully_connected_op_test.cc:78
    #4 0x4295f6 in caffe2::QuantizedFullyConnectedTest_Test_Test::TestBody() caffe2/caffe2/fb/operators/quantized_fully_connected_op_test.cc:110
    #5 0x7fb4eee3b6a1 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2458
    #6 0x7fb4eee2cbe1 in testing::Test::Run() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2475
    #7 0x7fb4eee2cd27 in testing::TestInfo::Run() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2656
    #8 0x7fb4eee2ce34 in testing::TestCase::Run() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2774
    #9 0x7fb4eee2eb8b in testing::internal::UnitTestImpl::RunAllTests() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:4649
    #10 0x7fb4eee2ef3c in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2458
    #11 0x7fb4eee2ef3c in testing::UnitTest::Run() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:4257
    #12 0x7fb4fbee2ed0 in RUN_ALL_TESTS() third-party-buck/gcc-5-glibc-2.23/build/googletest/include/gtest/gtest.h:2233
    #13 0x7fb4fbee2d60 in main common/gtest/LightMain.cpp:12
    #14 0x7fb4e0ef7857 in __libc_start_main /home/engshare/third-party2/glibc/2.23/src/glibc-2.23/csu/../csu/libc-start.c:289
    #15 0x424e08 in _start /home/engshare/third-party2/glibc/2.23/src/glibc-2.23/csu/../sysdeps/x86_64/start.S:118
UndefinedBehaviorSanitizer: invalid-bool-load caffe2/caffe2/core/qtensor.h:61:40
```

Reviewed By: yfeldblum

Differential Revision: D5898877

fbshipit-source-id: e32b1732a1946fdafaec67b3fbc072dc93bcd917
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants