Merging nodes with same name but different interfaces drops further nodes without telling #14

andres-fr · 2019-09-19T05:10:47Z

Hi yahooers! Thanks for this very helpful software. I noticed one issue at graph merging, seems like a bug. I have version 1.2.4

Scenario:

Define graph1 = abs(a + (a * b))**3, with operations named (mul1, sum1, abspow1)
Define graph2 = abs(a + (a * b))**2, with operations named (mul1, sum1, abspow2)
Merge via graphkit.compose(name="graph3", merge=True)(graph1, graph2)
Both sum1 operations have same name, but crucially different output names
abspow1 depends on sum1, and abspow2 depends on sum2

In this scenario, two options would be acceptable when evaluating the merged graph:

Merge just mul1 (same name AND interface) and leave the rest as independent branches
Merge also sum2 into sum1, and make abspow2 point to sum1`

But what is happening is that both the second sum1 and abspow2 are being dropped, without any warning or exception thrown. Also, the squared output of abspow2 is missing from the merged graph's output.

I assume the reason is that the merged graph can't fullfil abspow2's interface after sum1's "faulty" merge and is dropping it silently... Is this behaviour expected? The docs are pretty clear about the "merging by name process", but I feel like this scenario could be problematic if a user inadvertently messes up the interfaces, and whole branches of the graph are dropped.

In case this is unexpected, these could be 2 possible approaches:

Maybe throwing something like an InterfaceConfictError and dismissing the merge (more conservative)
Make all nodes that pointed to sum2 point to sum1 after they merge, regardless of their interface (would require some assumptions on the interface or potentially a redesign of the whole library into a statically typed, TF-ish ̶n̶i̶g̶h̶t̶m̶a̶r̶e̶ experience)

I uploaded a unittest with a reproducible example into this gist, It is in the test_graph_merge_dropping method. Let me know if I can be of any help via PR.

Cheers,
Andres

The text was updated successfully, but these errors were encountered:

ankostis · 2019-09-29T21:57:59Z

These are the workflows, to facilitate discussion:

graph1 & graph2

merged: (graph1 + graph2), (graph2 + graph1)

ankostis · 2019-09-29T22:01:55Z

Diagrams above taken from a new TC in https://github.com/ankostis/graphkit/tree/fix14, assembled from the gist in the OP.

andres-fr · 2019-09-29T23:27:05Z

Wait, so the merging is actually producing 2 graphs? I think I missed that one... how do you collect the second one? I can't find any difference between your test_graph_merge_dropping() and mine. Note that the fact that the method is passing is bad: it is asserting that nodes are being dropped from structure and computation...

andres-fr · 2019-09-29T23:32:17Z

Even if both graphs are somehow gathered, I still see a problem: some_sum is not any value, but specifically the value returned by graph2.sum1.

If the merging operation is not able to keep this dependency, I still think it should terminate with an error or at least notify the user.

ankostis · 2019-09-30T09:18:15Z

Updated diagrams, adding the inverse merge (graph2 + graph1).

@andres-fr I notice that the description in the OP does not accurately depict hte situation in the diagrams. Would you care to update it?

andres-fr · 2019-09-30T15:31:15Z

@ankostis sure, but for that I would need to know how are you able to extract 2 graphs after merging. How are you collecting the second one? In my code I just get the "bigger" one and for that case the OP would describe the situation...

ankostis · 2019-09-30T15:42:04Z

I'm running py3.7, and maybe ordered dicts have something to do with it.
In any case, try running my latest branch, and modify the sources to plot the diagrams before executing the networks.

[edit]
Non-deterministic order of dictionaries prior to Python-3.6 (and of sets) is a real killer for DAG applications like this one. PY2's EOL comes around the corner, i hope all Python version prior to PY3.6 are included in the burial.

andres-fr · 2019-10-01T03:55:54Z

Yeah, I'm sorry but I will probably switch to dependency_injector. I like the idea that they structure the interface IDs through Python's own namespace and control flow system, thus avoiding the issues that arise when using strings as identifiers, like e.g. in this issue.

It gets some more effort to get into it, but then you can be sure that it just works. Without extending too much, I also love the possibility of creating singleton, factory and config types out of the box.

In any case, it is a different concept and not necessarily a competitor... I want to thank you very much for your very informative table and your great help with this project. If no further people jump in I am closing this issue.

Cheers!
Andres

NOTE dict are not deterministic in <PY3.6. So this commit would not improve determinism in those pythons. + build: add `boltons` dependency for ndexedSet. + doc: mark all set usage if affect determinism. + e.g. see reproducibility problem in yahoo#14.

ankostis added a commit to ankostis/graphtik that referenced this issue Sep 29, 2019

test(merge,yahoo#14): add merge TC from given gist, passing

a5c94c8

ankostis added a commit to ankostis/graphtik that referenced this issue Sep 30, 2019

test(merge,yahoo#14): add merge TC from given gist, passing

a56f127

andres-fr closed this as completed Oct 1, 2019

ankostis mentioned this issue Oct 1, 2019

FIX: use insertion-ordered sets for deterministic results #23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging nodes with same name but different interfaces drops further nodes without telling #14

Merging nodes with same name but different interfaces drops further nodes without telling #14

andres-fr commented Sep 19, 2019 •

edited

Loading

ankostis commented Sep 29, 2019 •

edited

Loading

ankostis commented Sep 29, 2019

andres-fr commented Sep 29, 2019

andres-fr commented Sep 29, 2019

ankostis commented Sep 30, 2019 •

edited

Loading

andres-fr commented Sep 30, 2019

ankostis commented Sep 30, 2019 •

edited

Loading

andres-fr commented Oct 1, 2019 •

edited

Loading

Merging nodes with same name but different interfaces drops further nodes without telling #14

Merging nodes with same name but different interfaces drops further nodes without telling #14

Comments

andres-fr commented Sep 19, 2019 • edited Loading

ankostis commented Sep 29, 2019 • edited Loading

graph1 & graph2

merged: (graph1 + graph2), (graph2 + graph1)

ankostis commented Sep 29, 2019

andres-fr commented Sep 29, 2019

andres-fr commented Sep 29, 2019

ankostis commented Sep 30, 2019 • edited Loading

andres-fr commented Sep 30, 2019

ankostis commented Sep 30, 2019 • edited Loading

andres-fr commented Oct 1, 2019 • edited Loading

andres-fr commented Sep 19, 2019 •

edited

Loading

ankostis commented Sep 29, 2019 •

edited

Loading

ankostis commented Sep 30, 2019 •

edited

Loading

ankostis commented Sep 30, 2019 •

edited

Loading

andres-fr commented Oct 1, 2019 •

edited

Loading