Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending multi-io support #1051

Closed
wants to merge 1 commit into from
Closed

Extending multi-io support #1051

wants to merge 1 commit into from

Conversation

elanmart
Copy link

Extends #1018 and #1035.

The PR includes:

  • Simple modification to Graph API (only one input argument, modify set_previous and allow more flexible connectivity using multi-io layers
  • MultiInputOutputLayer (name very much [WIP])

The goal was to allow arbitrary connectivity between layers inside a Graph.
I think it is more intuitive than join mode inside a Merge.

Also, the new MultiInputOutputLayer has two additional advantages: it replicates Graph API and allows users to easily create new layers.

Below you can find a simple example of using the new API to build an attention module (will become usefull as soon as we figure out how to write a Recurrent container):

class Attention(MultiInputOutoutLayer):
    """ Given a hidden state to condition on and sequence to attend, returns a context C 
    beign a weighted average of the attended"""

    def __init__(self, hidden_dim, init='glorot_uniform', activation='tanh', weights=None,
                 input_dim=None, **kwargs):
        self.init = initializations.get(init)
        self.activation = activations.get(activation)
        self.hidden_dim = hidden_dim
        super(Attention, self).__init__(**kwargs)

    @property
    def inputs_info(self):
        return {'state':2, 'attended':3}

    def build(self):
        input_dim = self.inputs['state'].input_shape[1]
        sequence_dim = self.inputs['attended'].input_shape[2]

        self.W_hid = self.init((input_dim + sequence_dim, self.hidden_dim))
        self.W_sft = self.init((self.hidden_dim, 1))
        self.b_hid = self.init((self.hidden_dim, ))
        self.b_sft = self.init((1, ))

        self.params = [self.W_hid, self.W_sft, self.b_hid, self.b_sft]

    @property
    def output_shape(self):
        return (None, 1)

    def get_output(self, train=False):
        ins = self.get_input(train)
        state, attended = ins['state'], ins['attended']

        state = state.dimshuffle(0,'x',1)
        repeated_state = T.repeat(state, attended.shape[1], axis=1)

        X = T.concatenate((repeated_state, attended), 2)

        h = self.activation(T.dot(X, self.W_hid) + self.b_hid)
        pre_softmax = self.activation(T.dot(h, self.W_sft) + self.b_sft)                
        pre_softmax = pre_softmax.reshape((pre_softmax.shape[0], pre_softmax.shape[1]))

        return softmax(pre_softmax)

    def get_config(self):
        return {}

# -------------------------------------------------------------------------------------

g = Graph()

g.add_input('X', (100,))
g.add_input('att', (None, 250))

g.add_node(Attention(256), 'attention', input={'state':'X', 'attended':'att'}, create_output=True)

g.compile(Adam(), {'attention':'mse'})

ping @EderSantana I'd love to hear Your opinion.

@dbonadiman
Copy link
Contributor

I like the idea seriously, it is more intuitive then the current API even the MultiLayer is a good idea since we are going to have lot of them (At least all the layers i have implemented in my library follows this idea :D )

@EderSantana
Copy link
Contributor

nice @elanmart I will check this out soon! You also made me remember its time to resume working on the recurrent container.

@dbonadiman
Copy link
Contributor

@EderSantana I thought about it a lot :) for now i'm implementing attentive models with huge repeat vectors, Time distributed merge and Time distributed dense :) Fairly inefficient ^_^

@@ -239,44 +246,59 @@ def add_input(self, name, input_shape, dtype='float'):
'input_shape': input_shape,
'dtype': dtype})

def add_node(self, layer, name, input=None, inputs=[],
merge_mode='concat', concat_axis=-1, dot_axes=-1, create_output=False):
def add_node(self, layer, name, input=None, merge_mode='concat', concat_axis=-1, create_output=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another change in API and will break some models. Will need docs and public announcements.

Can you leave old inputs for a while for compatibility purposes? We could mark it for deprecation and give the users a month or so to adapt.

@EderSantana
Copy link
Contributor

@elanmart I didn't understand (but that is my fault) how a MultiInputOutputLayer connects with a regular layer. If the MIMO layer has only one output, does it work seamlessly? Also, can the outputs of a MIMO be split in a Graph model? For example, can two different conventional Layers each one get an input from MIMOLayer?

TODOs:

  • Mostly docs and usage examples.
  • Backwards compatibility and deprecation notice.
  • Get tests passing.

@elanmart
Copy link
Author

@EderSantana Thanks for comments!

If the MIMO layer has only one output, does it work seamlessly?

Yes, but note that MIMO works only with Graphs. Anyway, You can write:

g = Graph()

g.add_input('X', (100,))
g.add_input('att', (None, 250))

g.add_node(Attention(256), 'attention', input={'state':'X', 'attended':'att'}, create_output=False)
g.add_node(Dense(128), 'd', input='attention', create_output=True)

g.compile(Adam(), {'d':'mse'})

For example, can two different conventional Layers each one get an input from MIMOLayer?

Well, without this the MIMOLayer would be useless!

The way it is done now is what I found most convienient:
given a MIMOLayer with name name and outputs out_1, out_2, inside graph two nodes will be created: name_out_1, and name_out_2. Now each of them is perfectly valid node, so you can plug them into whatever new Layer you want.

@EderSantana
Copy link
Contributor

got it, thanks for the clarifications!

if n in self.nodes:
to_merge.append(self.nodes[n])
elif n in self.inputs:
to_merge.append(self.inputs[n])
else:
raise Exception('Unknown identifier: ' + n)
merge = Merge(to_merge, mode=merge_mode, concat_axis=concat_axis, dot_axes=dot_axes)
raise Exception('Unknown identifier: ' + n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary whitespace. Please install a PEP8 linter to spot these issues during development.

@fchollet
Copy link
Member

only one input argument

Agreed, that's better.

modify set_previous and allow more flexible connectivity using multi-io layers

Can you expand on that?

@elanmart
Copy link
Author

@fchollet thanks for comments, I'm gonna fix all those small issues ASAP if the main idea gets approved.

modify set_previous and allow more flexible connectivity using multi-io layers

Can you expand on that?

As in #620(comment)

model.add_node(layer, input={'my_layer_input_1': 'node_name_1', 'my_layer_input_2': 'node_name_2'})

only that node_name_1 is now any Layer or any output of MIMOLayer / Graph. So that you can have a graph inside your model, and then feed one of its outputs to one Layer, and a different output to a different Layer.

@fchollet
Copy link
Member

I believe it's a good idea. Since such an interface is assumption-heavy, making the UX work smoothly will require very thorough assumption checking and helpful error messages.

@elanmart
Copy link
Author

@fchollet what about the MIMOLayer itself?

@farizrahman4u
Copy link
Contributor

@EderSantana @elanmart Whats the recurrent container?

@elanmart
Copy link
Author

See #620

@EderSantana perhaps we could team-up on this one too, I've got some ideas that I think can be quite usefull.

@EderSantana
Copy link
Contributor

Oh yeah @farizrahman4u and @elanmart lets do this!!! My sample code worked on the pre-shape inference Keras. But now with the new modifications and so many people familiar with Keras source, it can be much easier to do it. I could make a PR with the old code and update it from there or we could start a new one from scratch. At least we have an API idea now, the implementation will just flow... We have a chance to build the best and most intuitive way to design RNNs. Totally abstracting for-loops whatsoever. Let us continue this conversation on #620 So we don't @elanmart 's PR.

@bobchennan
Copy link

Any progress on this issue?

@fchollet
Copy link
Member

Closing outdated PRs following the release of Keras 1.0.

@fchollet fchollet closed this Apr 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants