Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging nodes with same name but different interfaces drops further nodes without telling #14

Closed
andres-fr opened this issue Sep 19, 2019 · 8 comments

Comments

@andres-fr
Copy link

andres-fr commented Sep 19, 2019

Hi yahooers! Thanks for this very helpful software. I noticed one issue at graph merging, seems like a bug. I have version 1.2.4

Scenario:

  • Define graph1 = abs(a + (a * b))**3, with operations named (mul1, sum1, abspow1)
  • Define graph2 = abs(a + (a * b))**2, with operations named (mul1, sum1, abspow2)
  • Merge via graphkit.compose(name="graph3", merge=True)(graph1, graph2)
  • Both sum1 operations have same name, but crucially different output names
  • abspow1 depends on sum1, and abspow2 depends on sum2

In this scenario, two options would be acceptable when evaluating the merged graph:

  1. Merge just mul1 (same name AND interface) and leave the rest as independent branches
  2. Merge also sum2 into sum1, and make abspow2 point to sum1`

But what is happening is that both the second sum1 and abspow2 are being dropped, without any warning or exception thrown. Also, the squared output of abspow2 is missing from the merged graph's output.

I assume the reason is that the merged graph can't fullfil abspow2's interface after sum1's "faulty" merge and is dropping it silently... Is this behaviour expected? The docs are pretty clear about the "merging by name process", but I feel like this scenario could be problematic if a user inadvertently messes up the interfaces, and whole branches of the graph are dropped.

In case this is unexpected, these could be 2 possible approaches:

  1. Maybe throwing something like an InterfaceConfictError and dismissing the merge (more conservative)
  2. Make all nodes that pointed to sum2 point to sum1 after they merge, regardless of their interface (would require some assumptions on the interface or potentially a redesign of the whole library into a statically typed, TF-ish ̶n̶i̶g̶h̶t̶m̶a̶r̶e̶ experience)

I uploaded a unittest with a reproducible example into this gist, It is in the test_graph_merge_dropping method. Let me know if I can be of any help via PR.

Cheers,
Andres

@ankostis
Copy link

ankostis commented Sep 29, 2019

These are the workflows, to facilitate discussion:

graph1 & graph2

merged: (graph1 + graph2), (graph2 + graph1)

graph1+grap2graph2+graph1

ankostis added a commit to ankostis/graphtik that referenced this issue Sep 29, 2019
@ankostis
Copy link

Diagrams above taken from a new TC in https://github.com/ankostis/graphkit/tree/fix14, assembled from the gist in the OP.

@andres-fr
Copy link
Author

Wait, so the merging is actually producing 2 graphs? I think I missed that one... how do you collect the second one? I can't find any difference between your test_graph_merge_dropping() and mine. Note that the fact that the method is passing is bad: it is asserting that nodes are being dropped from structure and computation...

@andres-fr
Copy link
Author

Even if both graphs are somehow gathered, I still see a problem: some_sum is not any value, but specifically the value returned by graph2.sum1.

If the merging operation is not able to keep this dependency, I still think it should terminate with an error or at least notify the user.

ankostis added a commit to ankostis/graphtik that referenced this issue Sep 30, 2019
@ankostis
Copy link

ankostis commented Sep 30, 2019

Updated diagrams, adding the inverse merge (graph2 + graph1).

@andres-fr I notice that the description in the OP does not accurately depict hte situation in the diagrams. Would you care to update it?

@andres-fr
Copy link
Author

@ankostis sure, but for that I would need to know how are you able to extract 2 graphs after merging. How are you collecting the second one? In my code I just get the "bigger" one and for that case the OP would describe the situation...

@ankostis
Copy link

ankostis commented Sep 30, 2019

I'm running py3.7, and maybe ordered dicts have something to do with it.
In any case, try running my latest branch, and modify the sources to plot the diagrams before executing the networks.

[edit]
Non-deterministic order of dictionaries prior to Python-3.6 (and of sets) is a real killer for DAG applications like this one. PY2's EOL comes around the corner, i hope all Python version prior to PY3.6 are included in the burial.

@andres-fr
Copy link
Author

andres-fr commented Oct 1, 2019

Yeah, I'm sorry but I will probably switch to dependency_injector. I like the idea that they structure the interface IDs through Python's own namespace and control flow system, thus avoiding the issues that arise when using strings as identifiers, like e.g. in this issue.

It gets some more effort to get into it, but then you can be sure that it just works. Without extending too much, I also love the possibility of creating singleton, factory and config types out of the box.

In any case, it is a different concept and not necessarily a competitor... I want to thank you very much for your very informative table and your great help with this project. If no further people jump in I am closing this issue.

Cheers!
Andres

ankostis added a commit to ankostis/graphtik that referenced this issue Oct 1, 2019
NOTE dict are not deterministic in <PY3.6.
So this commit would not improve determinism
in those pythons.

+ build: add `boltons` dependency for ndexedSet.
+ doc: mark all set usage if affect determinism.
+ e.g. see reproducibility problem in yahoo#14.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants