Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning non-tensors from keras.layer.Layer #516

Closed
DavidLandup0 opened this issue Jul 17, 2023 · 8 comments
Closed

Returning non-tensors from keras.layer.Layer #516

DavidLandup0 opened this issue Jul 17, 2023 · 8 comments

Comments

@DavidLandup0
Copy link

DavidLandup0 commented Jul 17, 2023

In Keras 2.0, it was allowed to return tensors and non-tensors from a keras.layers.Layer.

Something like:

    def call(self, x):
        x = self.proj(x)
        shape = x.shape
        x = keras.ops.reshape(x, (-1, shape[1] * shape[2], shape[3]))
        x = self.norm(x)
        return x, shape[1], shape[2] # return tensor, original H and W

In the new API, this throws an error:

All operation outputs must be tensors. Operation <LayerName name=layer_name, built=True> returned a non-tensor. Non-tensor received: 56

Attempted a naive fix:

return x, keras.ops.convert_to_tensor(shape[1]), keras.ops.convert_to_tensor(shape[2])

But this seems to "unbuild" the layer:

Layer 'layer_name' looks like it has unbuilt state, but Keras is not able to trace the layer `call()` in order to build it automatically. You must implement the `def build(self, input_shape)` method on your layer. It should create all variables used by the layer (e.g. by calling `layer.build()` on all its children layers).

Using keras.ops.shape falls back to the backend implementation of shape(). In this case, it's equivalent to tf.shape() and results in the same error.

Is there a supported way to return non-tensors beside tensors from a call() within a Layer? If not, how can tensors be returned in a multi-backend context?

Thanks!

@ianstenbit
Copy link
Contributor

Hey @DavidLandup0 can you post a full repro for this?

I am trying to reproduce like this

import keras_core as keras
import numpy as np

class OverlappingPatchingAndEmbedding(keras.layers.Layer):
    def __init__(self, out_channels=32, patch_size=7, stride=4, **kwargs):
        super().__init__(**kwargs)
        self.proj = keras.layers.Conv2D(
            filters=out_channels,
            kernel_size=patch_size,
            strides=stride,
            padding="same",
        )
        self.norm = keras.layers.LayerNormalization()

    def call(self, x):
        x = self.proj(x)
        # B, H, W, C
        shape = x.shape
        x = keras.ops.reshape(x, (-1, shape[1] * shape[2], shape[3]))
        x = self.norm(x)
        return x, shape[1], shape[2]

inputs = np.random.uniform(size=(1, 16, 16, 8))
model = keras.Sequential([OverlappingPatchingAndEmbedding()])
outputs = model(inputs)

print(inputs)
print(outputs)

And I'm not seeing any errors.

@DavidLandup0
Copy link
Author

That's odd. Here's a clean Google Colab with the exact same code: https://colab.research.google.com/drive/196JtnB1joCRNWyMbDIX4aXvf0_a7MhKI?usp=sharing

@ianstenbit
Copy link
Contributor

That's odd. Here's a clean Google Colab with the exact same code: https://colab.research.google.com/drive/196JtnB1joCRNWyMbDIX4aXvf0_a7MhKI?usp=sharing

Ahh interesting -- looks like it's working with the jax backend but not the TF backend. I am looking into it to try to understand the root cause

@DavidLandup0
Copy link
Author

Thanks! It looks like there's a check that explicitly checks whether all returns are tensors and raises the error:

for x in self.outputs:
.

Without digging too much into it, it looks like a design choice, but I don't know if it was purposefully set 😅

@ianstenbit
Copy link
Contributor

Without digging too much into it, it looks like a design choice

Indeed -- I think this was intentional. Maybe @fchollet you can shed some light on why this is the case?

In the meantime, we can pass the resulting shapes as symbolic tensors as long as we specify the output spec of our layer. With this definition of the layer, this is now working for me across all backends:

import keras_core as keras
import numpy as np
from keras_core import KerasTensor

class OverlappingPatchingAndEmbedding(keras.layers.Layer):
    def __init__(self, out_channels=32, patch_size=7, stride=4, **kwargs):
        super().__init__(**kwargs)
        self.proj = keras.layers.Conv2D(
            filters=out_channels,
            kernel_size=patch_size,
            strides=stride,
            padding="same",
        )
        self.norm = keras.layers.LayerNormalization()
        self.stride = stride

    def call(self, x):
        x = self.proj(x)
        # B, H, W, C
        shape = x.shape
        x = keras.ops.reshape(x, (-1, shape[1] * shape[2], shape[3]))
        x = self.norm(x)
        return x, keras.ops.convert_to_tensor(shape[1]), keras.ops.convert_to_tensor(shape[2])

    def compute_output_spec(self, inputs):
        input_shape = inputs.shape
        output_shape = (input_shape[0], (input_shape[1] // self.stride) * (input_shape[2] // self.stride), input_shape[3])
        output_spec = KerasTensor(output_shape, dtype=self.compute_dtype)
        return output_spec, KerasTensor([0]), KerasTensor([0])

@fchollet
Copy link
Member

In all likelihood you don't actually need to do this and you can find an alternative solution. But if you really want to return shapes, then they must be tensors. The correct way to do this would be to use convert_to_tensor(ops.shape(x)). When doing this also make sure to implement compute_output_spec (like in Ian's example above) -- this is where the error you were seeing comes from.

@ianstenbit
Copy link
Contributor

I think we can close this as WAI. my comment above shows how we can return constants from layers for use cases like this, but as a general rule this is an anti-pattern which should be avoided where possible.

@DavidLandup0
Copy link
Author

Thank you both for the replies! @fchollet @ianstenbit
I'll probably go with pre-computing the H,W values instead of returning from a layer. We probably won't need this feature - I just wasn't sure if it was intended or not initially, since it was allowed in Keras 2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants