Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT: feat: Prototype dataflow analysis for static circuit extraction #1664

Draft
wants to merge 16 commits into
base: acl/const_fold3
Choose a base branch
from

Conversation

doug-q
Copy link
Collaborator

@doug-q doug-q commented Nov 18, 2024

Based on #1603.

Includes an rough inliner and call graph.

See the new snapshots for current capabilities.

aborgna-q and others added 15 commits November 12, 2024 17:31
Metadata in `hugr-core` is attached to nodes and to `OpDef`s, consisting
of a map from string names to `serde_json::Value`s. On top of that,
`OpDef` also has a `description` field. This PR imports and exports node
metadata by serializing it to a JSON string and wrapping that string
with a `prelude.json` constructor. It also exports the metadata of
`OpDef`s, in which case the description field can be exported as a
string directly. This PR also introduces string escaping for the text
format (#1549).

By wrapping the metadata in a JSON type on the `hugr-model` side, we
leave open the option to have typed metadata via the usual term system
in the future.

Closes #1631.
Wraps the topoConvexChecker used by `SiblingSubgraph` in a `OnceCell` so
we can delay initialization until it's needed.

This lets us avoid traversing the graph (and computing a topological
order) when creating sibling subgraphs with zero or one nodes.

See benchmark results:
```
$ critcmp before after -f '.*subgraph'
group                      after                                  before
-----                      -----                                  ------
singleton_subgraph/10      1.00      3.0±0.08µs        ? ?/sec    2.71      8.1±0.18µs        ? ?/sec
singleton_subgraph/100     1.00      4.8±0.07µs        ? ?/sec    9.81     47.4±1.16µs        ? ?/sec
singleton_subgraph/1000    1.00     23.2±1.86µs        ? ?/sec    18.94  439.3±17.04µs        ? ?/sec
multinode_subgraph/10      1.01     17.9±0.25µs        ? ?/sec    1.00     17.7±0.29µs        ? ?/sec
multinode_subgraph/100     1.01    170.0±3.72µs        ? ?/sec    1.00    168.5±3.04µs        ? ?/sec
multinode_subgraph/1000    1.02      2.4±0.02ms        ? ?/sec    1.00      2.3±0.04ms        ? ?/sec
```
`singleton_subgraph` creates a single-node subgraph in a region with
around `k * 3` nodes.
`multinode_subgraph` creates a subgraph with 1/3rd of the nodes for the
same region.

This PR is quite noisy since it's adding those two new benchmarks, and
tidying up the files in the process.

drive-by: Add `bench = false` for all the targets in the workspace.
Otherwise, the auto-added test harness threw errors when passing
criterion flags to `cargo bench`.
return;
}

if ins.iter().any(|x| x == &PartialValue::Bottom) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this demonstrates exactly why I put in the test you comment on here !!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, any bottom inputs => all bottom outputs.

That we have to check this at the linked callsite of interpret_leaf_op and here is annoying. Perhaps interpret_leaf_op should only be called when there are no Bottom inputs?

@doug-q
Copy link
Collaborator Author

doug-q commented Nov 18, 2024

I think the 125 files changed is because I merged main. Annoying.

Copy link
Contributor

@acl-cqc acl-cqc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok... so my overall thought is this is not really a dataflow analysis. What you want is the part of dataflow analysis that does "join" at the exit of every conditional, and indeed on every control-flow merge; and possibly for TailLoops; but the things you want to join are circuits, not any kind of value that's per-qubit. (Because that join/merge logic currently only exists in dataflow analysis, "give a man a hammer and everything looks like a nail", but the other bits of the dataflow framework are really not helping here, and it'd be better to do something else - another datalog, probably, with just the bit you need. If you really want analysis like "only these cases are reachable" we should determine that using DF and then simplify the Hugr with this knowledge.)

Also, you need to do some much-more-complex factoring of classical values so that you can catch "quantum-part-of-circuit same, classical part different" which you clearly want to do. And there is the isomorphism concern....

So, overall, afraid I still think that the best approach is to say flat circuits (no nested control flow) are statically scheduled; and then we transform into that. That lets us deploy a wide range of different optimizations to get there, perhaps including this or something like it, but with flexibility to choose an optimization according to what works on a particular circuit. (I slightly favour a rewriting approach but don't mean to rule this approach out.)

}
} else {
assert_eq!(qb_outs.len(), qb_ins.len(), "{e:?}");
let mb_in_qbs = qb_ins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am finding this quite hard to follow and this is the critical part of the algorithm, so if I've got the wrong end of the stick, my conclusions may be....entirely wrong :). A few comments wouldn't hurt....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking, you've certainly identified the roughest area. Very much WIP.

hugr-passes/src/static_circuit.rs Outdated Show resolved Hide resolved
.collect::<Option<HashMap<_, _>>>();
let gating_qbs = mb_in_qbs
.iter()
.flat_map(|hash_map| hash_map.iter().map(|(_, qbh)| qbh.qb_index()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooof. So is that equivalent to
mb_in_qbs.unwrap_or_default().values().map(QbHistory::qb_index) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the "gating_nodes" of a new node is the ordered list of input gates, right? That could go in a comment on Gate::gating_nodes.

I'm slightly concerned that Node (rather than (Node, Port)) is sufficiently accurate - the wire coming from one or another port sounds like an important difference - I think the counterargument is that if a different port were used, because of linearity other places would see different Node instead, so we might be OK.

.collect_vec();
let qb_out_to_in: HashMap<_, _> = zip_eq(qb_outs, qb_ins).collect();
for (i, v) in outs.iter_mut().enumerate() {
let mb_join_val = (|| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The no-args-closure-that-you-immediately-call is so that you can use ? within it to exit just the closure, right?

Consider using continue in the successful case, and then join-ing with Top if you haven't continued

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even

let mut mb_join_val = PartialValue::Top;
if .... {
  if .... {
     ....
     mb_join_val = qbh.into();
   }
}
v.join(mb_join_val);

for (i, v) in outs.iter_mut().enumerate() {
let mb_join_val = (|| {
let in_p = qb_out_to_in.get(&OutgoingPort::from(i))?;
let hash_map = mb_in_qbs.as_ref()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if you did unwrap_or_default when constructing mb_in_qbs, so you had an empty map rather than None, then the get(...)? on the line below would do both ?s in one go

Some(qbh.into())
})();

v.join_mut(mb_join_val.unwrap_or(PartialValue::Top));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short comment - if the corresponding in-port has a QbHistory, push this gate (+ ordered list of its inputs) onto that QbHistory...

But, if this in-port is classical, join with Top. Doesn't this hit 3.py where you have two different classical values?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is to always propagate TOP for classical outputs. In StaticCircuitPass we only look at linear QB_T wires, so I don't think it matters that classical wires are TOP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does one proceed in that case (given a TOP)? Is the goal here just to identify cases which could be statically scheduled if we then did a separate transformation pass?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOP only goes on the classical wires. If the circuit structure depends on those top values then we will fail to extract a static circuit.

Once we've extracted a static circuit, we should schedule it, then insert it at the "end" of the hugr(whatever that means). We'll need to wire up the angles. Then delete the original gate ops.

Does this answer your question?

g
}
pub fn join(mut self, other: Self) -> Option<Self> {
if self.gate != other.gate || self.gating_nodes != other.gating_nodes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we require gating_nodes to be exactly equal?

Doesn't this rule out cases like a conditional where each Case contains "the same" circuit, as the Node IDs will be different? You'd need some kind of isomorphism....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We identify qubits by the node id of the qalloc. "gating_nodes" is a terrible name, it's the set of qubits we are operating on. This set is the same in two cases of a conditional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think gating_nodes gets the node-id of the gate added, not the node id of the qalloc (/function input)?

@doug-q
Copy link
Collaborator Author

doug-q commented Nov 18, 2024

Ok... so my overall thought is this is not really a dataflow analysis. What you want is the part of dataflow analysis that does "join" at the exit of every conditional, and indeed on every control-flow merge; and possibly for TailLoops; but the things you want to join are circuits, not any kind of value that's per-qubit. (Because that join/merge logic currently only exists in dataflow analysis, "give a man a hammer and everything looks like a nail", but the other bits of the dataflow framework are really not helping here, and it'd be better to do something else - another datalog, probably, with just the bit you need. If you really want analysis like "only these cases are reachable" we should determine that using DF and then simplify the Hugr with this knowledge.)

I disagree. What is your criteria for being a dataflow analysis? Mine is: A well defined join and transfer function.

I have pushed and the test cases are now working. I had to add handling for "arguments of public functions are top". I do not claim that my interpret_leaf_op is understandable yet! (but it seems mostly correct)

@acl-cqc
Copy link
Contributor

acl-cqc commented Nov 18, 2024

What is your criteria for being a dataflow analysis?

An abstract value per wire (perhaps only a subset of wires, i.e. only those of type Qubit). Here you want an abstract value per circuit (per sibling subgraph).

I think the analysis framework you want is:

  • client provides a type of values V. Yes, these will need join.
  • client provides a function interpret_dfg(parent: Node, child_results: Map<Node, V>) -> V which is called with a map whose keys are the container nodes in the dfg.
    • For "nested DFG" nodes, the value in the map is the result of calling interpret_dfg, so we could rely on interpret_dfg to recursively call itself, but I think this just makes the rules more complicated
    • The framework handles Conditional nodes by recursing on each child Case, and then joining the resulting V's
    • The framework handles CFG nodes similarly by merging at CF joins and copying at control-flow splits. (There is some value in doing Tag/Value/Sum analysis on wires here but I think that could be done separately as a first step)
    • The framework handles TailLoop's...somehow (I mean, any quantum op in a tailloop will report "no static circuit" unless it's a never-iterates loop, right?)

The use-case here then instantiates V == Circuit == Hugr<RootNode=DFG>, and interpret_dfg inlines the map (replacing each container node keyed in the Map with the DFG value from the Map)

I think this is necessary to handle qalloc/qfree which are not represented in the outputs (where you suggest "this might require Dense Analysis")

@acl-cqc
Copy link
Contributor

acl-cqc commented Nov 18, 2024

Ok, so qalloc/qfree could be represented in the containing region's outputs mentioning (in their gating_nodes) 2-qubit ops with the ancillae. But then you've got only a nodeindex, not a description of the op (and the history of the ancilla), so if you had a conditional which in each arm did qalloc, ..., qfree with (case 1) the same, or (case 2) different ops, to combine the ancilla with the conditional's input-and-output qubit, we still need to do some graph traversal (not here) to identify or distinguish the two arms?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants