Skip to content
This repository has been archived by the owner on Oct 13, 2021. It is now read-only.

Add support for tf.nn.depth_to_space lambda #492

Merged
merged 9 commits into from
May 19, 2020

Conversation

CNugteren
Copy link
Contributor

Added support for tf.nn.depth_to_space as a Keras lambda, mapped to the ONNX DepthToSpace operator.

The two added test-cases visualised with Netron:

Channels first (NCHW), channel dimension reduced from 4 to 4/(2*2) = 1, width & height dimension increased by a factor 2:
temp_NCHW onnx

Channels last (NHWC), channel dimension reduced from 8 to 8/(2*2) = 2, width & height dimension increased by a factor 2:
temp_NHWC onnx

Although the visualisations look correct as far as I can see, the NHWC test-case fails with the following error in the ORT:

onnxruntime.capi.onnxruntime_pybind11_state.Fail:
[ONNXRuntimeError] : 1 : FAIL : Node:lambda/DepthToSpace Output:lambda
[ShapeInferenceError] Can't merge shape info.
Both source and target dimension have values but they differ.
Source=1 Target=8 Dimension=1

This is my first contribution to keras-onnx (and first ONNX contribution in general), so please review carefully. And I'm also asking for help w.r.t. the second test-case. Or is that a bug in ORT?

@CLAassistant
Copy link

CLAassistant commented May 17, 2020

CLA assistant check
All committers have signed the CLA.

@jiafatom
Copy link
Collaborator

Thanks for your contribution. There is test failure for your case, dimension mismatch -- This is because the spec of tensorflow DepthToSpace and onnx DepthToSpace may be different. It is not a simple plug-in, you need understand both tf and onnx behavior, and use onnx ops to construct the tf op correctly, so may need some additional manipulation such as Transpose, etc.

@CNugteren
Copy link
Contributor Author

CNugteren commented May 17, 2020

Thanks @jiafatom. Yes, of course I noticed the newly introduced failed test, see also my message :-) I'll dig deeper and see if I can figure out what needs to be done, thanks.

(sorry for closing/re-opening, pressed the wrong button)

@CNugteren CNugteren closed this May 17, 2020
@jiafatom
Copy link
Collaborator

Thanks @jiafatom. Yes, of course I noticed the newly introduced failed test, see also my message :-)

There is an example to use onnx DepthToSpace here, you can see that we add a few ops to finish the conversion.

@CNugteren CNugteren reopened this May 17, 2020
@CNugteren
Copy link
Contributor Author

CNugteren commented May 18, 2020

Tensorflow documentation for NHWC case (the failing one):

It is useful to consider the operation as transforming a 6-D Tensor. e.g. for data_format = NHWC, Each element in the input tensor can be specified via 6 coordinates, ordered by decreasing memory layout significance as: n,iY,iX,bY,bX,oC (where n=batch index, iX, iY means X or Y coordinates within the input image, bX, bY means coordinates within the output block, oC means output channels). The output would be the input transposed to the following layout: n,iY,bY,iX,bX,oC

If we number n,iY,iX,bY,bX,oC as n=0, iY=1, iX=2, bY=3, bX=4, oC=5, we then get n,iY,bY,iX,bX,oC as 0,1,3,2,4,5 then that indeed doesn't seem to match the documentation of ONNX for the CRD case:

b, c, h, w = x.shape
tmp = np.reshape(x, [b, c // (blocksize ** 2), blocksize, blocksize, h, w])
tmp = np.transpose(tmp, [0, 1, 4, 2, 5, 3])
y = np.reshape(tmp, [b, c // (blocksize ** 2), h * blocksize, w * blocksize])

It does after inserting a transpose [0, 3, 1, 2] at the start and [0, 2, 3, 1] at the end. But if we would implement that, why use the ONNX DepthToSpace operator at all, since essentially it is just two reshapes and a single transpose?

So I've just implemented directly Tensorflow's documentation using 2 reshapes and a transpose for the NHWC case. For the NCHW case the test seems to pass already, so the ONNX DCR case seems to match Tensorflow's NCHW implementation.

@CNugteren
Copy link
Contributor Author

CNugteren commented May 18, 2020

I've 'fixed' some tests by skipping old TensorFlows and old opsets, but I still see these kind of failures in the CI builds:

E   tensorflow.python.framework.errors_impl.InvalidArgumentError:  Only NHWC data_format supported on CPU. Got NCHW
E   	 [[node lambda_1/DepthToSpace (defined at c:\miniconda\envs\py3.7\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]] [Op:__inference_keras_scratch_graph_1818]
E   

I don't have this issue on my test machine (also CPU only). Did you have similar issues in other tests and if so how do you normally circumvent them?

@jiafatom
Copy link
Collaborator

I've 'fixed' some tests by skipping old TensorFlows and old opsets, but I still see these kind of failures in the CI builds:

E   tensorflow.python.framework.errors_impl.InvalidArgumentError:  Only NHWC data_format supported on CPU. Got NCHW
E   	 [[node lambda_1/DepthToSpace (defined at c:\miniconda\envs\py3.7\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]] [Op:__inference_keras_scratch_graph_1818]
E   

I don't have this issue on my test machine (also CPU only). Did you have similar issues in other tests and if so how do you normally circumvent them?

The failed CI build is on linux-conda_ci and win32-conda-ci, which uses standalone keras and tf backend (not tf.keras). So if you set TF_KERAS=0 at your local machine, can you reproduce this issue?

keras2onnx/_builtin.py Outdated Show resolved Hide resolved
keras2onnx/_builtin.py Outdated Show resolved Hide resolved
@CNugteren
Copy link
Contributor Author

The failed CI build is on linux-conda_ci and win32-conda-ci, which uses standalone keras and tf backend (not tf.keras). So if you set TF_KERAS=0 at your local machine, can you reproduce this issue?

No, it turned out to be that older versions of TF (pre 2.1.0) don't support NCHW mode for this op on the CPU. So I've added another skip in the tests.

block_size = node.get_attr('block_size')
oopb = OnnxOperatorBuilder(container, scope)
if _is_nhwc(node):
_, h, w, c = _cal_tensor_shape(node.inputs[0])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need some help, because here I have now set n = -1 such that it is OK if the batch dimension is unknown. But what if h and w are also unknown, i.e. their values are None. How to then do the reshapes below?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point here. This line has issue when the tensor shape is unknown (dynamic). Search our code base to find the case when _cal_tensor_shape is None. We need handle that -- Add a Shape op after node.inputs[0] to get the dynamic shape, and use Slice op to get h and w, and concatenate with other dimensions to make the desired_shape. We have some examples in our code.

Copy link
Contributor Author

@CNugteren CNugteren May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at an example and it is not trivial, code quickly becomes unreadable because of all the extra nodes added. Then in that case I reconsidered the use of the actual DepthToSpace node, so I worked on that a bit and made it work with two extra transposes. I've added a test case with unknown tensor sizes as well and that one now also passes :-) Could you have another look at the code? Thanks!

Copy link
Collaborator

@jiafatom jiafatom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for your contribution!

@jiafatom jiafatom merged commit 717ea09 into onnx:master May 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants