[RFC] Graph API

Subject:	Graph API
Authors:	Michael Copeland Romain Dorgueil
Created:	May 25, 2017
Modified:	Jul 6 2017
Target:	1.0
Status:	Draft

The purpose of this page is to generate ideas around the implementation of the Graph API. Comments are welcome!

Comments:

@jelloslinger
- thumbs up on the current API (seems that is all you need for a minimum viable product)
- of all the suggestions below, in favor of Future Proposal #1 (operators)
- any convenience notation/operators/methods should use the current API under the hood. Most (if not all) features of the current API would need to be expressible.
- not sure if I'm a fan of using inheritance at this point
@hartym
- we need something that allows graphs to be defined in a logical order. The API should be intuitive to developers familiar with ETL and some programming.
- As an example in the current API, you need to explicitly specify which chain to "end first" if multiple chains are specified at input. The new proposal should make this more apparent.

Graph API is for now very minimalist, and that was a choice. Better not implement too much before we exactly know what we want. Now that the standard library starts to be a bit more stuffed, let's think about how we can enhance this API.

Goals

Better developer experience.
Define graphs with forks and joins in a few different ways: * Graph subclass ? * «bubble» * Factory
There should be a way to reference any point in the graph, and it should allow to have more than one node containing the same value.
There should be a way to extend a graph, either by inserting new nodes, removing some nodes, or overriding some nodes.
Graph visualization (using graphviz).

Example 1: simple fork

https://g.gravizo.com/svg?digraph%20G%20{%20rankdir%20=%20LR;%20A%20-%3E%20B;%20B%20-%3E%20C%20-%3E%20C1;%20B%20-%3E%20D%20-%3E%20D1;%20}

digraph G {
  rankdir = LR;
  A -> B;
  B -> C -> C1;
  B -> D -> D1;
}

# Current API

graph = bonobo.Graph()

graph.add_chain(A, B)
graph.add_chain(C, C1, _input=B)
graph.add_chain(D, D1, _input=B)

# Future proposal 1, using operators

graph = bonobo.Graph()

graph += (A, B)
graph[B] += (C, C1)
graph[B] += (D, D1)

# ... or using inheritance

class MyGraph(bonobo.Graph):
    def setup(self):  # setup? init? how to pass arguments?
        self += (A, B)
        self[B] += (C, C1)
        self[B] += (D, D1)

# Future proposal 2:

graph = bonobo.Graph()
graph.append(A, B)  # implicit "BEGIN"
# ...or graph[BEGIN].append ?
graph[B].append(C, C1)
graph[B].append(D, D1)

# pro : this is "list-like"
# con : can't have twice the same node in the graph, but maybe can overcome that with some way to specify
#       which one we talk about if there is ambiguity ?

Example 2: simple join

https://g.gravizo.com/svg?digraph%20G%20{%20rankdir%20=%20LR;%20A1%20-%3E%20A2%20-%3E%20C;%20B1%20-%3E%20B2%20-%3E%20C;%20C%20-%3E%20D;%20}

digraph G {
rankdir = LR;
  A1 -> A2 -> C;
  B1 -> B2 -> C;
  C -> D;
}

Current API

graph = bonobo.Graph()

graph.add_chain(C, D, _input=None, _name='trunk')
graph.add_chain(A1, A2, _output='trunk')
graph.add_chain(B1, B2, _output='trunk')

Future (ideas, not decided)

Imperative

graph = bonobo.Graph()

graph += (A1, A2)
graph += (B1, B2)
graph[(A1, A2)] += (C, D)  # ???

Example 3: bubble

https://g.gravizo.com/svg?digraph%20G%20{%20rankdir%20=%20LR;%20A%20-%3E%20B;%20B%20-%3E%20C1%20-%3E%20C2%20-%3E%20F;%20B%20-%3E%20D1%20-%3E%20D2%20-%3E%20F;%20B%20-%3E%20E1%20-%3E%20E2%20-%3E%20F;%20F%20-%3E%20G;%20}

digraph G {
  rankdir = LR;
  A -> B;
  B -> C1 -> C2 -> F;
  B -> D1 -> D2 -> F;
  B -> E1 -> E2 -> F;
  F -> G;
}

Current API

graph = bonobo.Graph()

graph.add_chain(A, B)
graph.add_chain(F, G, _input=None, _name='trunk')
graph.add_chain(C1, C2, _input=B, _output='trunk')
graph.add_chain(D1, D2, _input=B, _output='trunk')
graph.add_chain(E1, E2, _input=B, _output='trunk')

Future (ideas, not decided)

Imperative

graph = bonobo.Graph()

graph += (A, B)
graph[B] += (C1, C2)
graph[B] += (D1, D2)
graph[B] += (E1, E2)
graph[(C2, D2, E2)] += (F, G)  # ???

Random notes and ideas

gf = GraphFactory()
gf |= foo | bar | baz
gf[foo] |= a | b | c

graph = gf()

Operator ? Pipes, or plus, maybe some way to have a "pillar" of transformations

Bonobo ETL - Documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Graph API

Goals

Example 1: simple fork

Example 2: simple join

Current API

Future (ideas, not decided)

Imperative

Example 3: bubble

Current API

Future (ideas, not decided)

Imperative

Random notes and ideas

Clone this wiki locally