Extending multi-io support #1051

elanmart · 2015-11-20T22:36:13Z

The PR includes:

Simple modification to Graph API (only one input argument, modify set_previous and allow more flexible connectivity using multi-io layers
MultiInputOutputLayer (name very much [WIP])

The goal was to allow arbitrary connectivity between layers inside a Graph.
I think it is more intuitive than join mode inside a Merge.

Also, the new MultiInputOutputLayer has two additional advantages: it replicates Graph API and allows users to easily create new layers.

Below you can find a simple example of using the new API to build an attention module (will become usefull as soon as we figure out how to write a Recurrent container):

class Attention(MultiInputOutoutLayer):
    """ Given a hidden state to condition on and sequence to attend, returns a context C 
    beign a weighted average of the attended"""

    def __init__(self, hidden_dim, init='glorot_uniform', activation='tanh', weights=None,
                 input_dim=None, **kwargs):
        self.init = initializations.get(init)
        self.activation = activations.get(activation)
        self.hidden_dim = hidden_dim
        super(Attention, self).__init__(**kwargs)

    @property
    def inputs_info(self):
        return {'state':2, 'attended':3}

    def build(self):
        input_dim = self.inputs['state'].input_shape[1]
        sequence_dim = self.inputs['attended'].input_shape[2]

        self.W_hid = self.init((input_dim + sequence_dim, self.hidden_dim))
        self.W_sft = self.init((self.hidden_dim, 1))
        self.b_hid = self.init((self.hidden_dim, ))
        self.b_sft = self.init((1, ))

        self.params = [self.W_hid, self.W_sft, self.b_hid, self.b_sft]

    @property
    def output_shape(self):
        return (None, 1)

    def get_output(self, train=False):
        ins = self.get_input(train)
        state, attended = ins['state'], ins['attended']

        state = state.dimshuffle(0,'x',1)
        repeated_state = T.repeat(state, attended.shape[1], axis=1)

        X = T.concatenate((repeated_state, attended), 2)

        h = self.activation(T.dot(X, self.W_hid) + self.b_hid)
        pre_softmax = self.activation(T.dot(h, self.W_sft) + self.b_sft)                
        pre_softmax = pre_softmax.reshape((pre_softmax.shape[0], pre_softmax.shape[1]))

        return softmax(pre_softmax)

    def get_config(self):
        return {}

# -------------------------------------------------------------------------------------

g = Graph()

g.add_input('X', (100,))
g.add_input('att', (None, 250))

g.add_node(Attention(256), 'attention', input={'state':'X', 'attended':'att'}, create_output=True)

g.compile(Adam(), {'attention':'mse'})

ping @EderSantana I'd love to hear Your opinion.

dbonadiman · 2015-11-20T23:26:26Z

I like the idea seriously, it is more intuitive then the current API even the MultiLayer is a good idea since we are going to have lot of them (At least all the layers i have implemented in my library follows this idea :D )

EderSantana · 2015-11-22T09:48:54Z

nice @elanmart I will check this out soon! You also made me remember its time to resume working on the recurrent container.

dbonadiman · 2015-11-22T10:09:21Z

@EderSantana I thought about it a lot :) for now i'm implementing attentive models with huge repeat vectors, Time distributed merge and Time distributed dense :) Fairly inefficient ^_^

EderSantana · 2015-11-22T17:01:02Z

keras/layers/containers.py

@@ -239,44 +246,59 @@ def add_input(self, name, input_shape, dtype='float'):
                                  'input_shape': input_shape,
                                  'dtype': dtype})

-    def add_node(self, layer, name, input=None, inputs=[],
-                 merge_mode='concat', concat_axis=-1, dot_axes=-1, create_output=False):
+    def add_node(self, layer, name, input=None, merge_mode='concat', concat_axis=-1, create_output=False):


This is another change in API and will break some models. Will need docs and public announcements.

Can you leave old inputs for a while for compatibility purposes? We could mark it for deprecation and give the users a month or so to adapt.

EderSantana · 2015-11-22T17:31:16Z

@elanmart I didn't understand (but that is my fault) how a MultiInputOutputLayer connects with a regular layer. If the MIMO layer has only one output, does it work seamlessly? Also, can the outputs of a MIMO be split in a Graph model? For example, can two different conventional Layers each one get an input from MIMOLayer?

TODOs:

Mostly docs and usage examples.
Backwards compatibility and deprecation notice.
Get tests passing.

elanmart · 2015-11-22T18:16:59Z

@EderSantana Thanks for comments!

If the MIMO layer has only one output, does it work seamlessly?

Yes, but note that MIMO works only with Graphs. Anyway, You can write:

g = Graph()

g.add_input('X', (100,))
g.add_input('att', (None, 250))

g.add_node(Attention(256), 'attention', input={'state':'X', 'attended':'att'}, create_output=False)
g.add_node(Dense(128), 'd', input='attention', create_output=True)

g.compile(Adam(), {'d':'mse'})

For example, can two different conventional Layers each one get an input from MIMOLayer?

Well, without this the MIMOLayer would be useless!

The way it is done now is what I found most convienient:
given a MIMOLayer with name name and outputs out_1, out_2, inside graph two nodes will be created: name_out_1, and name_out_2. Now each of them is perfectly valid node, so you can plug them into whatever new Layer you want.

EderSantana · 2015-11-22T18:24:36Z

got it, thanks for the clarifications!

fchollet · 2015-11-22T20:08:48Z

keras/layers/containers.py

                if n in self.nodes:
                    to_merge.append(self.nodes[n])
                elif n in self.inputs:
                    to_merge.append(self.inputs[n])
                else:
-                    raise Exception('Unknown identifier: ' + n)
-            merge = Merge(to_merge, mode=merge_mode, concat_axis=concat_axis, dot_axes=dot_axes)
+                    raise Exception('Unknown identifier: ' + n)                    


Unnecessary whitespace. Please install a PEP8 linter to spot these issues during development.

fchollet · 2015-11-22T20:10:29Z

only one input argument

Agreed, that's better.

modify set_previous and allow more flexible connectivity using multi-io layers

Can you expand on that?

elanmart · 2015-11-22T20:47:17Z

@fchollet thanks for comments, I'm gonna fix all those small issues ASAP if the main idea gets approved.

modify set_previous and allow more flexible connectivity using multi-io layers

Can you expand on that?

As in #620(comment)

model.add_node(layer, input={'my_layer_input_1': 'node_name_1', 'my_layer_input_2': 'node_name_2'})

only that node_name_1 is now any Layer or any output of MIMOLayer / Graph. So that you can have a graph inside your model, and then feed one of its outputs to one Layer, and a different output to a different Layer.

fchollet · 2015-11-22T20:52:20Z

I believe it's a good idea. Since such an interface is assumption-heavy, making the UX work smoothly will require very thorough assumption checking and helpful error messages.

elanmart · 2015-11-22T21:10:47Z

@fchollet what about the MIMOLayer itself?

farizrahman4u · 2015-11-22T22:28:54Z

@EderSantana @elanmart Whats the recurrent container?

elanmart · 2015-11-22T22:33:10Z

See #620

@EderSantana perhaps we could team-up on this one too, I've got some ideas that I think can be quite usefull.

EderSantana · 2015-11-22T22:49:07Z

Oh yeah @farizrahman4u and @elanmart lets do this!!! My sample code worked on the pre-shape inference Keras. But now with the new modifications and so many people familiar with Keras source, it can be much easier to do it. I could make a PR with the old code and update it from there or we could start a new one from scratch. At least we have an API idea now, the implementation will just flow... We have a chance to build the best and most intuitive way to design RNNs. Totally abstracting for-loops whatsoever. Let us continue this conversation on #620 So we don't @elanmart 's PR.

bobchennan · 2016-02-16T15:37:54Z

Any progress on this issue?

fchollet · 2016-04-11T20:35:14Z

Closing outdated PRs following the release of Keras 1.0.

Extending multi-io support

8f7925d

EderSantana reviewed Nov 22, 2015
View reviewed changes

fchollet reviewed Nov 22, 2015
View reviewed changes

elanmart mentioned this pull request Nov 29, 2015

attention layer requires another PR #1094

Closed

fchollet closed this Apr 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending multi-io support #1051

Extending multi-io support #1051

elanmart commented Nov 20, 2015

dbonadiman commented Nov 20, 2015

EderSantana commented Nov 22, 2015

dbonadiman commented Nov 22, 2015

EderSantana Nov 22, 2015

EderSantana commented Nov 22, 2015

elanmart commented Nov 22, 2015

EderSantana commented Nov 22, 2015

fchollet Nov 22, 2015

fchollet commented Nov 22, 2015

elanmart commented Nov 22, 2015

fchollet commented Nov 22, 2015

elanmart commented Nov 22, 2015

farizrahman4u commented Nov 22, 2015

elanmart commented Nov 22, 2015

EderSantana commented Nov 22, 2015

bobchennan commented Feb 16, 2016

fchollet commented Apr 11, 2016

Extending multi-io support #1051

Extending multi-io support #1051

Conversation

elanmart commented Nov 20, 2015

dbonadiman commented Nov 20, 2015

EderSantana commented Nov 22, 2015

dbonadiman commented Nov 22, 2015

EderSantana Nov 22, 2015

Choose a reason for hiding this comment

EderSantana commented Nov 22, 2015

elanmart commented Nov 22, 2015

EderSantana commented Nov 22, 2015

fchollet Nov 22, 2015

Choose a reason for hiding this comment

fchollet commented Nov 22, 2015

elanmart commented Nov 22, 2015

fchollet commented Nov 22, 2015

elanmart commented Nov 22, 2015

farizrahman4u commented Nov 22, 2015

elanmart commented Nov 22, 2015

EderSantana commented Nov 22, 2015

bobchennan commented Feb 16, 2016

fchollet commented Apr 11, 2016