Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CPU only memcpy #633

Merged
merged 5 commits into from
Jul 10, 2014
Merged

Support CPU only memcpy #633

merged 5 commits into from
Jul 10, 2014

Conversation

kloudkl
Copy link
Contributor

@kloudkl kloudkl commented Jul 7, 2014

On a machine without GPU and that can't install the CUDA driver, #555 causes the error Check failed: error == cudaSuccess (35 vs. 0) CUDA driver version is insufficient for CUDA runtime version.

This PR adds an option to fall back in such situation.

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 7, 2014

@shelhamer, @jeffdonahue, @sguada, the very simple changes are tested and ready to be merged.

@longjon
Copy link
Contributor

longjon commented Jul 8, 2014

See comments at #604; I'm inclined to merge this soon.

To be super pedantic, I'd rather the check be for Caffe::mode() == Caffe::GPU, since

  • cudaMemcpy is explicitly a GPU thing, memcpy is an everybody thing
  • other checks in caffe of this form check against GPU mode
  • it makes it easier to search for GPU-specific code

@shelhamer
Copy link
Member

Agreed with @longjon's #604 (comment) and the pedantry. Only obstacle to merge is that this causes a bus error on a lot of GPU tests if you do have a GPU?

make: *** [runtest] Bus error: 10

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 9, 2014

@shelhamer, @longjon, SyncedMemory automatically switches to GPU mode when data is transferred between different devices. Both platforms with and without GPU have been tested. There is no longer any problem.

@@ -33,6 +33,7 @@ inline void SyncedMemory::to_cpu() {
CaffeMallocHost(&cpu_ptr_, size_);
own_cpu_data_ = true;
}
Caffe::set_mode(Caffe::GPU);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why set to GPU, is it CPU?

@longjon
Copy link
Contributor

longjon commented Jul 9, 2014

https://github.com/kloudkl/caffe/commit/ec967a5e38e78b15e75184803d5b97f018cd78cc makes me nervous. It seems one could do some work in GPU mode, tell caffe to switch to CPU mode, but be silently rebuffed and end up back in GPU mode. (Or, even without mode switching, one could start in CPU mode, do some GPU fiddling unrelated to running the net, and suddenly be in GPU mode.) In general I think Caffe::set_mode should only be called by user code.

How did we get there? The tests crash without that last commit; It looks like some of them assume that it's fine to invoke GPU memory in CPU mode. So what should it mean to be in CPU mode?

  1. Only the CPU is available, and all attempts to use the GPU should fail, or
  2. only the CPU will be used for computation, but it's fine to access all available hardware?

Currently it means something more like the latter, but that means memcpy is not necessarily going to work just because the mode is CPU.

Note that if a GPU is physically present, it's always okay to call cudaMemcpy, and if a GPU is not physically present, it's always okay to call memcpy. So maybe it's this condition, rather than the caffe mode, which we should check for.

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 9, 2014

The last resort is cuPointerGetAttribute.

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 9, 2014

cuPointerGetAttribute relies on the CUDA driver. It's not applicable on the CPU.

libgpuarray recommended by @bhack wraps the devices pointers to abstract out the differences.
CUDA

struct _gpudata {
#ifdef DEBUG
  char tag[8];
#endif
  CUdeviceptr ptr;
  CUevent ev;
  size_t sz;
  cuda_context *ctx;
  int flags;
  unsigned int refcnt;
};

OpenCL


struct _gpudata {
#ifdef DEBUG
  char tag[8];
#endif
  cl_mem buf;
  cl_event ev;
  cl_ctx *ctx;
  unsigned int refcnt;
};

The pointers must carry the pointer types with themselves to discriminate easily.

enum MemoryType {
  MEMORYTYPE_HOST,
  MEMORYTYPE_GPU,
  MEMORYTYPE_OPENCL,
  MEMORYTYPE_UNIFIED
}
template <typename Dtype>
class DevicePtr {
 public:
  Dtype* ptr;
  MemoryType type;
}

@robwhess, does this fit #587?

@Yangqing
Copy link
Member

Yangqing commented Jul 9, 2014

It makes me a little nervous too... I feel that syncedmem actually abuses caffe_memcpy a little bit (if it was me who wrote the code, blame me :)). Since it knows exactly which direction the copy is being carried out, it should explicitly call cudaMemcpy, which I assume would partially solve the problem @kloudkl points out? Setting a global variable to change things is a little flaky (and may introduce thread safety issues).

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 9, 2014

Happy ending!

@longjon
Copy link
Contributor

longjon commented Jul 9, 2014

Calling cudaMemcpy explicitly fixes the issue, although it regresses the uniform calling convention established in #555.

Searching through the code, note that caffe_copy is only ever called in an explicitly modal context, either in a case statement or a *_xpu function. Also note that caffe_memcpy is only ever called by SyncedMemory, and only for transfers between host and device. So, I suggest one of the following:

  • forgo the uniform argument order and allow SyncedMemory the explicit CUDA call, and get rid of the now unused caffe_memcpy; or
  • insist on the uniform argument order, rename caffe_memcpy to caffe_gpu_memcpy, and call it only from SyncedMemory, where it is amodal but explicitly requires the GPU.

Thoughts? (@shelhamer, how do feel about these options w.r.t. #555?)

As these are now rather fine points which can be corrected down the road, I intend in any case to merge one of these options later today to fix the issue when running without GPU.

@shelhamer
Copy link
Member

I vote for caffe_gpu_memcpy as you suggested. caffe_memcpy is quite a
special-purpose function, and only exists to abstract away the host/device
transfer from CUDA so that a CPU-only and/or OpenCL variation of Caffe
would have one less tendril of CUDA in the core code.

This fixes the issue, maintains the calling convention, and still lets us
take advantage of UVA once our plans take shape.

Le mercredi 9 juillet 2014, longjon notifications@github.com a écrit :

Calling cudaMemcpy explicitly fixes the issue, although it regresses the
uniform calling convention established in #555
#555.

Searching through the code, note that caffe_copy is only ever called in
an explicitly modal context, either in a case statement or a *_xpu
function. Also note that caffe_memcpy is only ever called by SyncedMemory,
and only for transfers between host and device. So, I suggest one of the
following:

  • forgo the uniform argument order and allow SyncedMemory the explicit
    CUDA call, and get rid of the now unused caffe_memcpy; or
  • insist on the uniform argument order, rename caffe_memcpy to
    caffe_gpu_memcpy, and call it only from SyncedMemory, where it is
    amodal but explicitly requires the GPU.

Thoughts? (@shelhamer https://github.com/shelhamer, how do feel about
these options w.r.t. #555 #555?)

As these are now rather fine points which can be corrected down the road,
I intend in any case to merge one of these options later today to fix the
issue when running without GPU.


Reply to this email directly or view it on GitHub
#633 (comment).

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 10, 2014

Finally done.

longjon added a commit that referenced this pull request Jul 10, 2014
@longjon longjon merged commit efa7176 into BVLC:dev Jul 10, 2014
@longjon
Copy link
Contributor

longjon commented Jul 10, 2014

Passes tests, warn, and lint; merged. Thanks @kloudkl for your patience and keeping this up-to-date with the discussion.

We should also add comments at some point explaining that mode should be set before calling caffe_copy, and the difference between caffe_copy and caffe_gpu_memcpy.

@kloudkl kloudkl deleted the cpu-only-memcpy branch July 10, 2014 01:41
mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014
RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants