Support CPU only memcpy #633

kloudkl · 2014-07-07T07:39:52Z

On a machine without GPU and that can't install the CUDA driver, #555 causes the error Check failed: error == cudaSuccess (35 vs. 0) CUDA driver version is insufficient for CUDA runtime version.

This PR adds an option to fall back in such situation.

kloudkl · 2014-07-07T07:49:04Z

@shelhamer, @jeffdonahue, @sguada, the very simple changes are tested and ready to be merged.

longjon · 2014-07-08T20:07:17Z

See comments at #604; I'm inclined to merge this soon.

To be super pedantic, I'd rather the check be for Caffe::mode() == Caffe::GPU, since

cudaMemcpy is explicitly a GPU thing, memcpy is an everybody thing
other checks in caffe of this form check against GPU mode
it makes it easier to search for GPU-specific code

shelhamer · 2014-07-08T21:30:23Z

Agreed with @longjon's #604 (comment) and the pedantry. Only obstacle to merge is that this causes a bus error on a lot of GPU tests if you do have a GPU?

make: *** [runtest] Bus error: 10

kloudkl · 2014-07-09T02:24:18Z

@shelhamer, @longjon, SyncedMemory automatically switches to GPU mode when data is transferred between different devices. Both platforms with and without GPU have been tested. There is no longer any problem.

thuanvh · 2014-07-09T04:16:34Z

src/caffe/syncedmem.cpp

@@ -33,6 +33,7 @@ inline void SyncedMemory::to_cpu() {
      CaffeMallocHost(&cpu_ptr_, size_);
      own_cpu_data_ = true;
    }
+    Caffe::set_mode(Caffe::GPU);


Why set to GPU, is it CPU?

longjon · 2014-07-09T09:06:35Z

https://github.com/kloudkl/caffe/commit/ec967a5e38e78b15e75184803d5b97f018cd78cc makes me nervous. It seems one could do some work in GPU mode, tell caffe to switch to CPU mode, but be silently rebuffed and end up back in GPU mode. (Or, even without mode switching, one could start in CPU mode, do some GPU fiddling unrelated to running the net, and suddenly be in GPU mode.) In general I think Caffe::set_mode should only be called by user code.

How did we get there? The tests crash without that last commit; It looks like some of them assume that it's fine to invoke GPU memory in CPU mode. So what should it mean to be in CPU mode?

Only the CPU is available, and all attempts to use the GPU should fail, or
only the CPU will be used for computation, but it's fine to access all available hardware?

Currently it means something more like the latter, but that means memcpy is not necessarily going to work just because the mode is CPU.

Note that if a GPU is physically present, it's always okay to call cudaMemcpy, and if a GPU is not physically present, it's always okay to call memcpy. So maybe it's this condition, rather than the caffe mode, which we should check for.

kloudkl · 2014-07-09T09:17:10Z

The last resort is cuPointerGetAttribute.

kloudkl · 2014-07-09T12:34:23Z

cuPointerGetAttribute relies on the CUDA driver. It's not applicable on the CPU.

libgpuarray recommended by @bhack wraps the devices pointers to abstract out the differences.
CUDA

struct _gpudata {
#ifdef DEBUG
  char tag[8];
#endif
  CUdeviceptr ptr;
  CUevent ev;
  size_t sz;
  cuda_context *ctx;
  int flags;
  unsigned int refcnt;
};

OpenCL


struct _gpudata {
#ifdef DEBUG
  char tag[8];
#endif
  cl_mem buf;
  cl_event ev;
  cl_ctx *ctx;
  unsigned int refcnt;
};

The pointers must carry the pointer types with themselves to discriminate easily.

enum MemoryType {
  MEMORYTYPE_HOST,
  MEMORYTYPE_GPU,
  MEMORYTYPE_OPENCL,
  MEMORYTYPE_UNIFIED
}
template <typename Dtype>
class DevicePtr {
 public:
  Dtype* ptr;
  MemoryType type;
}

@robwhess, does this fit #587?

Yangqing · 2014-07-09T17:42:18Z

It makes me a little nervous too... I feel that syncedmem actually abuses caffe_memcpy a little bit (if it was me who wrote the code, blame me :)). Since it knows exactly which direction the copy is being carried out, it should explicitly call cudaMemcpy, which I assume would partially solve the problem @kloudkl points out? Setting a global variable to change things is a little flaky (and may introduce thread safety issues).

kloudkl · 2014-07-09T17:59:22Z

Happy ending!

longjon · 2014-07-09T21:00:29Z

Calling cudaMemcpy explicitly fixes the issue, although it regresses the uniform calling convention established in #555.

Searching through the code, note that caffe_copy is only ever called in an explicitly modal context, either in a case statement or a *_xpu function. Also note that caffe_memcpy is only ever called by SyncedMemory, and only for transfers between host and device. So, I suggest one of the following:

forgo the uniform argument order and allow SyncedMemory the explicit CUDA call, and get rid of the now unused caffe_memcpy; or
insist on the uniform argument order, rename caffe_memcpy to caffe_gpu_memcpy, and call it only from SyncedMemory, where it is amodal but explicitly requires the GPU.

Thoughts? (@shelhamer, how do feel about these options w.r.t. #555?)

As these are now rather fine points which can be corrected down the road, I intend in any case to merge one of these options later today to fix the issue when running without GPU.

shelhamer · 2014-07-09T21:09:41Z

I vote for caffe_gpu_memcpy as you suggested. caffe_memcpy is quite a
special-purpose function, and only exists to abstract away the host/device
transfer from CUDA so that a CPU-only and/or OpenCL variation of Caffe
would have one less tendril of CUDA in the core code.

This fixes the issue, maintains the calling convention, and still lets us
take advantage of UVA once our plans take shape.

Le mercredi 9 juillet 2014, longjon notifications@github.com a écrit :

Calling cudaMemcpy explicitly fixes the issue, although it regresses the
uniform calling convention established in #555
#555.

Searching through the code, note that caffe_copy is only ever called in
an explicitly modal context, either in a case statement or a *_xpu
function. Also note that caffe_memcpy is only ever called by SyncedMemory,
and only for transfers between host and device. So, I suggest one of the
following:

forgo the uniform argument order and allow SyncedMemory the explicit
CUDA call, and get rid of the now unused caffe_memcpy; or

insist on the uniform argument order, rename caffe_memcpy to
caffe_gpu_memcpy, and call it only from SyncedMemory, where it is
amodal but explicitly requires the GPU.

Thoughts? (@shelhamer https://github.com/shelhamer, how do feel about
these options w.r.t. #555 #555?)

As these are now rather fine points which can be corrected down the road,
I intend in any case to merge one of these options later today to fix the
issue when running without GPU.

—
Reply to this email directly or view it on GitHub
#633 (comment).

kloudkl · 2014-07-10T00:04:44Z

Finally done.

Support CPU only memcpy

longjon · 2014-07-10T00:22:44Z

Passes tests, warn, and lint; merged. Thanks @kloudkl for your patience and keeping this up-to-date with the discussion.

We should also add comments at some point explaining that mode should be set before calling caffe_copy, and the difference between caffe_copy and caffe_gpu_memcpy.

Support CPU only memcpy

thuanvh mentioned this pull request Jul 7, 2014

caffe_copy error in CPU #635

Closed

This was referenced Jul 8, 2014

Implement device abstraction for remaining classes #587

Merged

error == cudaSuccess (38 vs. 0) no CUDA-capable device is detected #604

Closed

thuanvh reviewed Jul 9, 2014
View reviewed changes

kloudkl added 5 commits July 10, 2014 08:03

Avoid using cudaMemcpy for memcpy when there is no GPU and CUDA driver

3bd49ba

Check the GPU mode to decide which memcpy to use

4d16ed5

Switch to GPU mode when pointer is move to or from GPU in SyncedMemory

00433d8

Implement @Yangqing's solution to copy memory in the SyncedMemory

ac0dd39

Replace cudaMemcpy with caffe_gpu_memcpy in SyncedMemory per @longjon

904c2ce

longjon added a commit that referenced this pull request Jul 10, 2014

Merge pull request #633 from kloudkl/cpu-only-memcpy

efa7176

Support CPU only memcpy

longjon merged commit efa7176 into BVLC:dev Jul 10, 2014

kloudkl deleted the cpu-only-memcpy branch July 10, 2014 01:41

mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014

Merge pull request BVLC#633 from kloudkl/cpu-only-memcpy

62f8dca

Support CPU only memcpy

RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014

Merge pull request BVLC#633 from kloudkl/cpu-only-memcpy

51b249f

Support CPU only memcpy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support CPU only memcpy #633

Support CPU only memcpy #633

kloudkl commented Jul 7, 2014

kloudkl commented Jul 7, 2014

longjon commented Jul 8, 2014

shelhamer commented Jul 8, 2014

kloudkl commented Jul 9, 2014

thuanvh Jul 9, 2014

longjon commented Jul 9, 2014

kloudkl commented Jul 9, 2014

kloudkl commented Jul 9, 2014

Yangqing commented Jul 9, 2014

kloudkl commented Jul 9, 2014

longjon commented Jul 9, 2014

shelhamer commented Jul 9, 2014

kloudkl commented Jul 10, 2014

longjon commented Jul 10, 2014

Support CPU only memcpy #633

Support CPU only memcpy #633

Conversation

kloudkl commented Jul 7, 2014

kloudkl commented Jul 7, 2014

longjon commented Jul 8, 2014

shelhamer commented Jul 8, 2014

kloudkl commented Jul 9, 2014

thuanvh Jul 9, 2014

Choose a reason for hiding this comment

longjon commented Jul 9, 2014

kloudkl commented Jul 9, 2014

kloudkl commented Jul 9, 2014

Yangqing commented Jul 9, 2014

kloudkl commented Jul 9, 2014

longjon commented Jul 9, 2014

shelhamer commented Jul 9, 2014

kloudkl commented Jul 10, 2014

longjon commented Jul 10, 2014