-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle Sisyphus hashes #51
Comments
I agree that frequent changes of Sisyphus hashes should be avoided. This is true for cosmetic changes (changed names, refactoring of subnet-units etc) which are only annoying and also for semantic changes (different implementation/parameters) where it would break consistency.
This would be my envisioned solution. For a set of experiments to be reproducible, the exact version of the used recipes must be defined. Therefor my suggestion would be that if returnn_common is used, a specific commit is selected for one set of experiments (via Then if a different set of experiments is started, the user would use the most recent version of returnn_common for the new experiments. Anything else will be unusable if there is the possibility of semantic changes within a set of currently running experiments.
This is orthogonally to my first argument a good idea to maximize the hash-wise compatibility between sets of experiments. For instance not hash the resulting network dictionary, but base the hash on the structure of the building blocks and all used parameters.
This is true. We try to use custom hash logic to only include/exclude parameters of which we are sure that they do not affect the outcome (e.g. wallclock time, log verbosity). There are some possibilities to change important parameters without altering the hash (e.g. using external Paths or manually overwriting the hashes) but in any case the user has to put in effort to break the system. Also we probably should define unit tests to check if the hashes stay the same. And probably also to check if they change when we change some/any parameter that we deem to be semantic. |
Note on semantic changes: returnn-common is intended to be as strict as RETURNN itself on this, and also just like i6_core, so there should never be any semantic changes. Unless we introduce maybe a new behavior version. |
It is also a question how the actual Sisyphus setup structure looks like. My understanding was that you would clone i6_core into your recipes, and next to it returnn_common. What you suggest now is to treat returnn_common different than i6_core. This was not the original intention. returnn_common was supposed to be similar as i6_core. Just the focus was here in returnn_common to contain RETURNN specific building blocks for the RETURNN config, while i6_core was more Sisyphus recipe specific. |
Ah I see. Yes, if the intention of returnn-common is to be as stable as i6_core then this is a viable solution. But your initial statement was
which somehow contradicts your statement about stability. So then let me change my comment: But still we can discuss mechanism to prevent cosmetic changes in the net-dict to change the sis-hash. |
Ok, to make it more specific: E.g. when you define some model via the new functions, sth like:
Then semantically or logically, nothing should ever change for this code snippet, i.e.:
However, potentially things which can change:
|
Maybe the outcome here is also that we say we should try to minimize all cosmetic changes as much as possible once we have the first release. Or we say we want explicit Sis hashes. |
But this clearly means that returnn-common needs to be treated as "unstable" with respect to Sisyphus requirements. This was expected, and using a commit hash for returnn-common would be my favored solution (or rather, the only way I see right know in which I would agree to use returnn-common). Creating explicit sis hashes will be extremely complicated, as you need to find a suitable hash computation to correctly handle all future changes without even knowing them. This is not realistic. |
I mean, it is up to us. We can also decide that we want that all these things (cosmetic things) stay stable. Or that we make Sis hashes more explicit. Or maybe something in between. So it is certainly possible that we agree on some policy here for returnn-common such that it can be treated as stable w.r.t. Sisyphus. So, the question is rather, what do we want? What would make our workflow easier? What would make it easier to share code, common building blocks, etc? Just the same question applies to i6_core. So I wonder why we should decide different here for returnn_common than for i6_core?
I still would argue this is exactly the same situation as for any other |
We actually had a similar discussion in #2. The same question also applies to i6_experiments, although I think i6_experiments is by design more unstable, so the treatment of i6_experiments could be different. In any case, I think we should treat returnn_common and i6_core in a similar way. |
the Sisyphus hash mechanism includes (by default) the class name and all given parameters. In returnn-common a similar thing can be done. E.g. each Module that would lead to something being printed in the config gets a hash function that hashes all (important?) parameters plus the hashes of all inputs. |
Some further thoughts and results from recent discussions: If we implement this, it would follow basically what @michelwi outlined above, similar to how hashing of Sisyphus jobs and their arguments works right now in Sisyphus. But this does not outline the whole picture. It is not just a single module and arguments to it. Currently it could look like:
It is not enough to just hash We are now starting to get some first user experience (see #98) without any custom hashing logic, meaning that it is really the net dict only, and the dim tags are via their variable name (via Sis |
Why is this a problem at all? Introducing custom hashing does not mean to override the "global" net hashing in any way, but to provide specific hashes (or rather: hash overrides) for some base modules/functions just as you said.
So unless we are specifically "not" hashing them, they will always be hashed by default. So why is there any reason to be concerned about this? The hashing is a depth-first top down approach, always starting from the config dict "root", so all we would do is to replace the default hashing with a custom one for some sub-trees/nodes but never the global hashing itself. |
What do you mean by problem? There is no problem. It's just more complex.
It depends where and how you implement the hash. E.g. the solution I have in mind (but this is just one of many potential solutions) would be that you put some object or multiple objects into the config, which are similar to Let's say this is a single object, which somehow represents Now, how do you define the hash? This is what I'm discussing here (all the other technical things are not so relevant, only details). If there would be a single
And then of course the However, this is what I explained, it is not as simple. There is not a single But there is no problem. Everything can be solved. I just said that it's a bit more complex as we might have thought initially.
I don't understand. I'm discussing here the case that we explicitly define the hash. But even if you don't explicitly define it, how will this be hashed? We are discussing here the case of not just using the net dict for the hash. So what is the hash then? This is what I'm discussing here.
This is all clear. How is that related to what I said? What do you put into the config? At what point would you define the custom hashing? For what objects exactly? We are discussing here the case of not using the net dict. So, what objects? As explained, just the object structure is not enough. When you change the code for losses, for other things, etc, this all needs to be reflected in the hash. |
Another idea for a simple hashing scheme: We could just hash the code itself. Excluding all of returnn-common, and excluding other irrelevant things. So e.g. if the user has the model definition in a config, alongside with other unrelated stuff, we only should use the model definition code itself. So, this is maybe the tricky part, to really get the relevant code. To figure out the needed classes and functions. Also, some details need to be clarified further. Is it the code |
Note, there is some ongoing discussion on this aspect here: rwth-i6/i6_experiments#63 In the current implementation, there does not need to be any custom logic by returnn_common at all and this is all handled by the helpers in i6_experiments. I think we can close this issue here for now. We can reopen it when we think that we should do sth on returnn_common side about this. |
When this becomes more widely used, the resulting net dicts will often also be used for Sisyphus hashes. This means that every minor change can lead to changed Sis hashes. So also things like the layer name heuristics, etc.
I have heard already about different opinions and preferences on this aspect, so returnn-common will not enforce anything.
I expect the net dicts to change quite often even when there is no semantic or logical change (e.g. just some layer name heuristic changed, without changing param name spaces though). And then the consequence is that people either don't update returnn-common (which is bad), end up with forks of returnn-common with only selected changes (even worse), or we are forced to not make changes anymore to the net dict unless really necessary, which will possibly restrict us or require ugly workarounds later or so (also not good).
Because of that, my original idea was to not use the resulting net dict but some other intermediate representation for Sis hashes. Kind of similar as Sisyphus
Job
object also can make some aspects of the Sis hash explicit, e.g. by overriding thehash
function. However, this is not implemented yet, and this will probably also have some other drawbacks depending on the specific implementation. One concern was that people were afraid that actual semantic changes would possibly not lead to a changed Sis hash due to potential bugs in this implementation. Although my counter argument would be that this could be true for any SisyphusJob
with some customhash
logic (or even without when it depends on external things).In any case, maybe we should think a bit about this before the first release.
The text was updated successfully, but these errors were encountered: