Remove unnecessary tensor copy in load op #10402

kexinzhao · 2018-05-04T03:24:25Z

DeserializeFromStream function calls TensorFromStream, where it has the following code:

Paddle/paddle/fluid/framework/tensor_util.cc

Lines 319 to 336 in ccc594e

    
               if (platform::is_gpu_place(dev_ctx.GetPlace())) { 
        
           #ifdef PADDLE_WITH_CUDA 
        
                 Tensor cpu_tensor; 
        
                 cpu_tensor.Resize(framework::make_ddim(dims)); 
        
                 framework::VisitDataType( 
        
                     desc.data_type(), 
        
                     DeserializedDataFunctor(&buf, &cpu_tensor, ctx.GetPlace())); 
        
                 is.read(static_cast<char*>(buf), cpu_tensor.memory_size()); 
        
                 auto dst_place = dev_ctx.GetPlace(); 
        
                 framework::TensorCopy(cpu_tensor, dst_place, dev_ctx, tensor); 
        
           #else 
        
                 PADDLE_THROW("Unexpected branch"); 
        
           #endif 
        
               } else { 
        
                 framework::VisitDataType( 
        
                     desc.data_type(), 
        
                     DeserializedDataFunctor(&buf, tensor, ctx.GetPlace())); 
        
                 is.read(static_cast<char*>(buf), tensor->memory_size());

This means that TensorFromStream will first load tensor from disk to CPU place, if the load op is run on GPU place, it will copy the tensor to GPU via framework::TensorCopy(cpu_tensor, dst_place, dev_ctx, tensor);

So in the load op, we don't need to do the copy from CPU to GPU again.

reyoung

Excellent

kexinzhao force-pushed the prune_load_op branch from 914d3a6 to bc98165 Compare May 4, 2018 21:26

wangkuiyi requested review from reyoung and Xreki May 4, 2018 21:49

remove unnecessary tensor copy in save op

6565020

kexinzhao force-pushed the prune_load_op branch from bc98165 to 6565020 Compare May 6, 2018 23:07

reyoung approved these changes May 9, 2018

View reviewed changes

kexinzhao merged commit 170ac72 into PaddlePaddle:develop May 9, 2018

kexinzhao deleted the prune_load_op branch May 9, 2018 04:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unnecessary tensor copy in load op #10402

Remove unnecessary tensor copy in load op #10402

kexinzhao commented May 4, 2018 •

edited

Loading

reyoung left a comment

	if (platform::is_gpu_place(dev_ctx.GetPlace())) {
	#ifdef PADDLE_WITH_CUDA
	Tensor cpu_tensor;
	cpu_tensor.Resize(framework::make_ddim(dims));
	framework::VisitDataType(
	desc.data_type(),
	DeserializedDataFunctor(&buf, &cpu_tensor, ctx.GetPlace()));
	is.read(static_cast<char*>(buf), cpu_tensor.memory_size());
	auto dst_place = dev_ctx.GetPlace();
	framework::TensorCopy(cpu_tensor, dst_place, dev_ctx, tensor);
	#else
	PADDLE_THROW("Unexpected branch");
	#endif
	} else {
	framework::VisitDataType(
	desc.data_type(),
	DeserializedDataFunctor(&buf, tensor, ctx.GetPlace()));
	is.read(static_cast<char*>(buf), tensor->memory_size());

Remove unnecessary tensor copy in load op #10402

Remove unnecessary tensor copy in load op #10402

Conversation

kexinzhao commented May 4, 2018 • edited Loading

reyoung left a comment

Choose a reason for hiding this comment

kexinzhao commented May 4, 2018 •

edited

Loading