Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle Sisyphus hashes #51

Closed
albertz opened this issue Nov 3, 2021 · 15 comments
Closed

How to handle Sisyphus hashes #51

albertz opened this issue Nov 3, 2021 · 15 comments
Milestone

Comments

@albertz
Copy link
Member

albertz commented Nov 3, 2021

When this becomes more widely used, the resulting net dicts will often also be used for Sisyphus hashes. This means that every minor change can lead to changed Sis hashes. So also things like the layer name heuristics, etc.

I have heard already about different opinions and preferences on this aspect, so returnn-common will not enforce anything.

I expect the net dicts to change quite often even when there is no semantic or logical change (e.g. just some layer name heuristic changed, without changing param name spaces though). And then the consequence is that people either don't update returnn-common (which is bad), end up with forks of returnn-common with only selected changes (even worse), or we are forced to not make changes anymore to the net dict unless really necessary, which will possibly restrict us or require ugly workarounds later or so (also not good).

Because of that, my original idea was to not use the resulting net dict but some other intermediate representation for Sis hashes. Kind of similar as Sisyphus Job object also can make some aspects of the Sis hash explicit, e.g. by overriding the hash function. However, this is not implemented yet, and this will probably also have some other drawbacks depending on the specific implementation. One concern was that people were afraid that actual semantic changes would possibly not lead to a changed Sis hash due to potential bugs in this implementation. Although my counter argument would be that this could be true for any Sisyphus Job with some custom hash logic (or even without when it depends on external things).

In any case, maybe we should think a bit about this before the first release.

@michelwi
Copy link

michelwi commented Nov 3, 2021

I agree that frequent changes of Sisyphus hashes should be avoided. This is true for cosmetic changes (changed names, refactoring of subnet-units etc) which are only annoying and also for semantic changes (different implementation/parameters) where it would break consistency.

And then the consequence is that people either don't update returnn-common (which is bad),

This would be my envisioned solution. For a set of experiments to be reproducible, the exact version of the used recipes must be defined. Therefor my suggestion would be that if returnn_common is used, a specific commit is selected for one set of experiments (via CloneGitRepositoryJob). This version stays for this specific set of experiments until the user consciously decides to update, because he needs newer functionality.

Then if a different set of experiments is started, the user would use the most recent version of returnn_common for the new experiments. Anything else will be unusable if there is the possibility of semantic changes within a set of currently running experiments.

Because of that, my original idea was to not use the resulting net dict but some other intermediate representation for Sis hashes.

This is orthogonally to my first argument a good idea to maximize the hash-wise compatibility between sets of experiments. For instance not hash the resulting network dictionary, but base the hash on the structure of the building blocks and all used parameters.

Although my counter argument would be that this could be true for any Sisyphus Job with some custom hash logic (or even without when it depends on external things).

This is true. We try to use custom hash logic to only include/exclude parameters of which we are sure that they do not affect the outcome (e.g. wallclock time, log verbosity). There are some possibilities to change important parameters without altering the hash (e.g. using external Paths or manually overwriting the hashes) but in any case the user has to put in effort to break the system.
If we include a custom hash logic into returnn_common, then all users would have to put in effort in order to ensure the hashes to be consistent. I don't know if non-sisyphus users would like to be forced to do the extra work.

Also we probably should define unit tests to check if the hashes stay the same. And probably also to check if they change when we change some/any parameter that we deem to be semantic.

@albertz
Copy link
Member Author

albertz commented Nov 3, 2021

Note on semantic changes: returnn-common is intended to be as strict as RETURNN itself on this, and also just like i6_core, so there should never be any semantic changes. Unless we introduce maybe a new behavior version.

@albertz
Copy link
Member Author

albertz commented Nov 3, 2021

It is also a question how the actual Sisyphus setup structure looks like. My understanding was that you would clone i6_core into your recipes, and next to it returnn_common. What you suggest now is to treat returnn_common different than i6_core. This was not the original intention.

returnn_common was supposed to be similar as i6_core. Just the focus was here in returnn_common to contain RETURNN specific building blocks for the RETURNN config, while i6_core was more Sisyphus recipe specific.

@michelwi
Copy link

michelwi commented Nov 3, 2021

My understanding was that you would clone i6_core into your recipes, and next to it returnn_common.

Ah I see. Yes, if the intention of returnn-common is to be as stable as i6_core then this is a viable solution. But your initial statement was

I expect the net dicts to change quite often even when there is no semantic or logical change

which somehow contradicts your statement about stability.

So then let me change my comment:
If returnn-common is as stable as i6_core, then there will be no problems with changing hashes.
If it is less stable w.r.t. cosmetic changes, but stable w.r.t. semantic changes, then the fixing of one specific commit for one set of experiments would be the easiest solution.

But still we can discuss mechanism to prevent cosmetic changes in the net-dict to change the sis-hash.

@albertz
Copy link
Member Author

albertz commented Nov 3, 2021

Ok, to make it more specific: E.g. when you define some model via the new functions, sth like:

class MyModel(nn.Module):
  def __init__(self):
    self.conformer_encoder = ConformerEncoder(...)
    ...

  def forward(self, x):
    y = self.conformer_encoder(x)
    ...

Then semantically or logically, nothing should ever change for this code snippet, i.e.:

  • The numerical values which come out here when this is evaluated will not change up to numerical precision.
  • The TF parameter names will not change.

However, potentially things which can change:

  • The net dict in whatever way.
  • Layer names (TF parameter names will still be consistent by explicit name scopes in any case).
  • Maybe due to some better implementation the internal structure in some way.
  • The TF computation graph (also depending on RETURNN version).
  • Whatever other cosmetic changes which would not influence the semantic or logic behavior described above.

@albertz
Copy link
Member Author

albertz commented Nov 3, 2021

Maybe the outcome here is also that we say we should try to minimize all cosmetic changes as much as possible once we have the first release.

Or we say we want explicit Sis hashes.

@JackTemaki
Copy link
Contributor

However, potentially things which can change:

But this clearly means that returnn-common needs to be treated as "unstable" with respect to Sisyphus requirements. This was expected, and using a commit hash for returnn-common would be my favored solution (or rather, the only way I see right know in which I would agree to use returnn-common). Creating explicit sis hashes will be extremely complicated, as you need to find a suitable hash computation to correctly handle all future changes without even knowing them. This is not realistic.

@albertz
Copy link
Member Author

albertz commented Nov 3, 2021

However, potentially things which can change: ... (cosmetic things)

But this clearly means that returnn-common needs to be treated as "unstable" with respect to Sisyphus requirements. This was expected, and using a commit hash for returnn-common would be my favored solution (or rather, the only way I see right know in which I would agree to use returnn-common).

I mean, it is up to us. We can also decide that we want that all these things (cosmetic things) stay stable. Or that we make Sis hashes more explicit. Or maybe something in between. So it is certainly possible that we agree on some policy here for returnn-common such that it can be treated as stable w.r.t. Sisyphus.

So, the question is rather, what do we want? What would make our workflow easier? What would make it easier to share code, common building blocks, etc?

Just the same question applies to i6_core. So I wonder why we should decide different here for returnn_common than for i6_core?

Creating explicit sis hashes will be extremely complicated, as you need to find a suitable hash computation to correctly handle all future changes without even knowing them. This is not realistic.

I still would argue this is exactly the same situation as for any other Job we implement in i6_core.

@albertz
Copy link
Member Author

albertz commented Nov 3, 2021

We actually had a similar discussion in #2.

The same question also applies to i6_experiments, although I think i6_experiments is by design more unstable, so the treatment of i6_experiments could be different.

In any case, I think we should treat returnn_common and i6_core in a similar way.

@michelwi
Copy link

michelwi commented Nov 4, 2021

Creating explicit sis hashes will be extremely complicated, as you need to find a suitable hash computation to correctly handle all future changes without even knowing them.

the Sisyphus hash mechanism includes (by default) the class name and all given parameters. In returnn-common a similar thing can be done. E.g. each Module that would lead to something being printed in the config gets a hash function that hashes all (important?) parameters plus the hashes of all inputs.
If future updates include new functionality (i.e. new parameters are added) then we can even conserve the old hashes with a similar mechanism as __sis_has_exclude__ (i.e. if a parameter takes on a value corresponding to the old behavior, then it is not considered for the hash)

@albertz
Copy link
Member Author

albertz commented Feb 10, 2022

Some further thoughts and results from recent discussions:

If we implement this, it would follow basically what @michelwi outlined above, similar to how hashing of Sisyphus jobs and their arguments works right now in Sisyphus. But this does not outline the whole picture. It is not just a single module and arguments to it. Currently it could look like:

model = Model(...)
with nn.NameCtx.new_root() as name_ctx:
  in_ = net.extern_data(nn.Data("data", ...))
  targets = net.extern_data(nn.Data("classes", ...))
  out = model(in_)
  out.mark_as_default_output()
  loss = nn.cross_entropy(out, targets)
  loss.mark_as_loss()

It is not enough to just hash model here. All the other details around it matter as well. There could also be multiple models combined in some way. Basically it means to hash the whole higher-level computation graph, where the nodes are any such module or function calls. For function calls, it is a bit arbitrary where to define the boundary. For modules and functions, probably we only want to treat the base modules and functions from returnn-common as stable, and any user defined modules and functions can possibly change. So that would define the boundary. But handling this well might get a bit complex. Additionally, we need to hash all the markings (mark_as_default_output, mark_as_loss), and extern data and dim tags.

We are now starting to get some first user experience (see #98) without any custom hashing logic, meaning that it is really the net dict only, and the dim tags are via their variable name (via Sis CodeWrapper) + the Sis RETURNN config prolog code for the dim tags. We will see how soon or how frequent we will get to net dict changes or other changes which will cause changes in the hashes.

@JackTemaki
Copy link
Contributor

JackTemaki commented Feb 10, 2022

There could also be multiple models combined in some way

Why is this a problem at all? Introducing custom hashing does not mean to override the "global" net hashing in any way, but to provide specific hashes (or rather: hash overrides) for some base modules/functions just as you said.

Additionally, we need to hash all the markings (mark_as_default_output, mark_as_loss), and extern data and dim tags.

So unless we are specifically "not" hashing them, they will always be hashed by default. So why is there any reason to be concerned about this?

The hashing is a depth-first top down approach, always starting from the config dict "root", so all we would do is to replace the default hashing with a custom one for some sub-trees/nodes but never the global hashing itself.

@albertz
Copy link
Member Author

albertz commented Feb 11, 2022

There could also be multiple models combined in some way

Why is this a problem at all?

What do you mean by problem? There is no problem. It's just more complex.

Introducing custom hashing does not mean to override the "global" net hashing in any way, but to provide specific hashes (or rather: hash overrides) for some base modules/functions just as you said.

It depends where and how you implement the hash. E.g. the solution I have in mind (but this is just one of many potential solutions) would be that you put some object or multiple objects into the config, which are similar to CodeWrapper, i.e. they directly will produce the code as string, or maybe also the dict (not really relevant here), and these objects would define a custom hash.

Let's say this is a single object, which somehow represents extern_data and network (the net dict) or alternatively the get_network function.

Now, how do you define the hash? This is what I'm discussing here (all the other technical things are not so relevant, only details).

If there would be a single model module, it could look very simple like this (in this CodeWrapper-like object):

def _sis_hash(self):
  return sis_hash_helper(self.model)

And then of course the Module also needs a corresponding _sis_hash method, as explained before, like what @michelwi wrote, but this is what is anyway clear, and what we probably need in any case.

However, this is what I explained, it is not as simple. There is not a single model object which clearly defines the hash. The hashing need to contain all the things I explained in my previous post. Also, if self.model is a user-defined model, it is also not as simple, because then we would assume that it can potentially be changed.

But there is no problem. Everything can be solved. I just said that it's a bit more complex as we might have thought initially.

Additionally, we need to hash all the markings (mark_as_default_output, mark_as_loss), and extern data and dim tags.

So unless we are specifically "not" hashing them, they will always be hashed by default. So why is there any reason to be concerned about this?

I don't understand. I'm discussing here the case that we explicitly define the hash. But even if you don't explicitly define it, how will this be hashed? We are discussing here the case of not just using the net dict for the hash. So what is the hash then? This is what I'm discussing here.

The hashing is a depth-first top down approach, always starting from the config dict "root", so all we would do is to replace the default hashing with a custom one for some sub-trees/nodes but never the global hashing itself.

This is all clear. How is that related to what I said? What do you put into the config? At what point would you define the custom hashing? For what objects exactly? We are discussing here the case of not using the net dict. So, what objects? As explained, just the object structure is not enough. When you change the code for losses, for other things, etc, this all needs to be reflected in the hash.

@albertz
Copy link
Member Author

albertz commented Mar 8, 2022

Another idea for a simple hashing scheme: We could just hash the code itself.

Excluding all of returnn-common, and excluding other irrelevant things. So e.g. if the user has the model definition in a config, alongside with other unrelated stuff, we only should use the model definition code itself.

So, this is maybe the tricky part, to really get the relevant code. To figure out the needed classes and functions.

Also, some details need to be clarified further. Is it the code str + relative filename? Or __qualname__ of classes and functions instead of filename? Should we normalize the code somehow, by removing empty lines and comments? Or even directly using the AST?

@albertz
Copy link
Member Author

albertz commented Jun 13, 2022

Note, there is some ongoing discussion on this aspect here: rwth-i6/i6_experiments#63

In the current implementation, there does not need to be any custom logic by returnn_common at all and this is all handled by the helpers in i6_experiments.

I think we can close this issue here for now. We can reopen it when we think that we should do sth on returnn_common side about this.

@albertz albertz closed this as completed Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants