Identification algorithms #82

mschauer · 2023-06-14T17:57:37Z

@mwien I made a PR to be able to comment and assist.

Closes #61

mwien · 2023-06-14T21:30:25Z

Thanks, much appreciated :)
Feedback is very welcome!

mschauer · 2023-06-15T11:40:19Z

src/gensearch.jl

+        end
+    end
+    foreach(s -> genvisit(g, s, INIT), S)
+    return Set([x for x in 1:nv(g) if visited[3*x-2] || visited[3*x-1] || visited[3*x]])


Is this line called often?

It is called once per call of gensearch.

gensearch returns the set of all visited vertices. Imho this is the most general/flexible way to do it.

In the context of d-separation, one call of gensearch with S = X (and the right pass function) will return the set of all vertices d-connected with X. To test whether Y is d-separated from X one could then check whether the intersection of the returned set and Y is empty.

Alternatives would be:
(i) to return a boolean vector of visited vertices (less overhead, but also not so convenient to work with)
(ii) to write a second gensearch function which tests reachability (e.g., in the d-separation example it would directly return a boolean). This could be used in cases where the whole set of visited vertices is not needed.

Okay I think it’s fine

Suggested change

return Set([x for x in 1:nv(g) if visited[3*x-2] || visited[3*x-1] || visited[3*x]])

return Set(x for x in 1:nv(g) if visited[3*x-2] || visited[3*x-1] || visited[3*x])

would make sense.

src/gensearch.jl

src/cpdag.jl

mschauer · 2023-07-03T07:37:52Z

What is the status?

mschauer · 2023-07-03T07:39:57Z

We might even wait for JuliaGraphs/Graphs.jl#266

mwien · 2023-07-04T17:09:15Z

We now have the functions find, find_min and list for the adjustment criterion and for the back-door criterion.

The generalization of back-door to sets $X$ and $Y$ originally given by Pearl is imo a bit restrictive (the backdoor criterion has to hold for all pairs $x \in X$ and $y \in Y$) and hard for writing an efficient algorithm, so I did this a bit differently and more naturally (the set $Z$ satisfies the back-door criterion if (i) it contains no descendants of $X$ and (ii) it holds $(X \text{ indep } Y | Z)$ in $G$ with outgoing edges of the set $X$ removed).

Generally, I would be in favor of only having the adjustment criterion functions. Dagitty also does it like this. The criterion is sound and complete for adjustment, is well-studied from the algorithmic side and in any case the differences are minor.

E.g. if $X$ and $Y$ are singleton, then a graph has a set satisfying the adjustment criterion iff it has a set satisfying the back-door criterion. Moreover, in this case the minimal and minimum size sets are identical.

Finally, we might also want to add stuff on efficient adjustment sets (those with smallest asymptotic variance) and as far as I know those base on the adjustment criterion as well.

Having both functions could also be confusing.

mschauer · 2023-07-06T07:19:20Z

We you can either kick them out or add a sentence to the doc string “Provided for comparison only”

mschauer · 2023-07-06T08:44:31Z

For the docs, we might change https://mschauer.github.io/CausalInference.jl/latest/examples/backdoor_example/ to the new tools. By the way, is it correct that there Set([3]) is an adjustment set?

mwien · 2023-07-06T09:01:30Z

For the docs, we might change https://mschauer.github.io/CausalInference.jl/latest/examples/backdoor_example/ to the new tools.

Sounds good. If we explain the difference between back-door criterion and adjustment criterion in the docs I'm also fine with having implementations for both. Might actually be a good idea to explicitly discuss the differences instead of just give a function for one of them.

By the way, is it correct that there Set([3]) is an adjustment set?

No, that does not seem correct :/ Which function outputs this?
I get

julia> dag = digraph([
           1 => 3
           3 => 6
           2 => 5
           5 => 8
           6 => 7
           7 => 8
           1 => 4
           2 => 4
           4 => 6
           4 => 8])
{8, 10} directed simple Int64 graph

julia> for s in list_covariate_adjustment(dag, Set(6), Set(8))
           println(s)
       end
Set([4, 1])
Set([4, 3])
Set([4, 3, 1])
Set([4, 2])
Set([4, 2, 1])
Set([4, 2, 3])
Set([4, 2, 3, 1])
Set([5, 4])
Set([5, 4, 1])
Set([5, 4, 3])
Set([5, 4, 3, 1])
Set([5, 4, 2])
Set([5, 4, 2, 1])
Set([5, 4, 2, 3])
Set([5, 4, 2, 3, 1])

And that appears to be fine...

mschauer · 2023-07-06T09:10:58Z

Sorry for the noise, I looked at nodes 4 and 6 in error.

mschauer · 2023-07-06T09:22:44Z

This is really cool and quite useable

dag = digraph([1 => 3, 3 => 6, 2 => 5, 5 => 8, 6 => 7, 7 => 8, 1 => 4, 2 => 4, 4 => 6, 4 => 8])
println.(list_covariate_adjustment(dag, Set([6]), Set([8]), Set(Int[]), setdiff(Set(1:8), [1,2])));

Set([4, 3])
Set([5, 4])
Set([5, 4, 3])

PS: As test:

g = digraph([1 => 3, 3 => 6, 2 => 5, 5 => 8, 6 => 7, 7 => 8, 1 => 4, 2 => 4, 4 => 6, 4 => 8])
@test Set(list_covariate_adjustment(g, Set([6]), Set([8]), Set(Int[]), setdiff(Set(1:8), [1,2,6,8]))) == Set([Set([3,4]), Set([4,5]), Set([3,4,5])])
# or if this also should work
@test Set(list_covariate_adjustment(g, Set([6]), Set([8]), Set(Int[]), setdiff(Set(1:8), [1,2]))) == Set([Set([3,4]), Set([4,5]), Set([3,4,5])])

mwien · 2023-07-06T20:02:52Z

Next TODOs are:

making it easier to call some of the functions (e.g. that they can be also called with Ints or Vectors instead of only Sets) -> ✅
better documentation of the functions
polishing parts of the code ✅
adapt and extend docs section "Reasoning about experiments" -> separate PR

Hope I find some time tomorrow and next week for it

mschauer · 2023-07-07T12:50:31Z

I agree with the TODOs. The "Reasoning about experiments" can be done in a separate PR if we want to be a bit more incremental.

mwien · 2023-07-07T13:21:31Z

I agree with the TODOs. The "Reasoning about experiments" can be done in a separate PR if we want to be a bit more incremental.

Agreed, that's better

src/gensearch.jl

mschauer · 2023-07-10T06:48:34Z

src/gensearch.jl

+List all d-separators `Z` with `I subseteq Z subseteq R` for sets of vertices `X` and `Y` in `g`. 
+"""
+function list_dseps(g, X, Y, I = Set{eltype(g)}(), R = setdiff(Set(vertices(g)), X, Y))
+    X, Y, I, R = toset.((X, Y, I, R))


E.g. the cast on X and Y is not necessary?

You mean for list functions specifically or also in other places?

Here find_dsep would cast X at every call (which happens during list_dseps``) if we don't do the cast here. It's true that find_dsepalso doesn't really need X to be a Set, however, e.g.find_covariate_adjustment``` and other functions do (because membership v in X is tested repeatedly), so at that point I thought it was cleanest to cast everything.

mschauer · 2023-07-10T06:50:54Z

Have you seen https://docs.julialang.org/en/v1/manual/style-guide/#Handle-excess-argument-diversity-in-the-caller ? I think it doesn't really apply here but in principle widening


 struct ConstraintIterator{T<:Integer, S, U<:AbstractSet{T}, F<:Function}
     g::SimpleDiGraph{T}
     X::S
     Y::S
     I::U
     R::U
     find::F
 end

and trusting the iteration to generate errors for unsupported types should work. I think here it doesn't matter at all that X and Y are of a certain type as long as the methods are defined we need in the iterator.

mwien · 2023-07-10T11:11:23Z

Have you seen https://docs.julialang.org/en/v1/manual/style-guide/#Handle-excess-argument-diversity-in-the-caller ? I think it doesn't really apply here but in principle widening
 struct ConstraintIterator{T<:Integer, S, U<:AbstractSet{T}, F<:Function}
     g::SimpleDiGraph{T}
     X::S
     Y::S
     I::U
     R::U
     find::F
 end
and trusting the iteration to generate errors for unsupported types should work. I think here it doesn't matter at all that X and Y are of a certain type as long as the methods are defined we need in the iterator.

Agreed, that's better and more general.

I would still prefer to also cast X and Y in list_dseps etc, because it is more efficient for our use case (if we wouldn't cast it there it would get cast for every call of find = find_dsep).

mwien · 2023-07-10T11:33:33Z

Other solution (instead of having a cast at the beginning of every function) would be to add X::Set{eltype(g)} etc to every function. Then, have a wrapper function which accepts other types, does a cast to Set and calls the other function.

Generally I think casting isn't so bad here, because the Sets are basically our internal representation for efficiency. We could have something else than Sets, like a special kind of bitvector representation, which supports fast membership queries and union/intersect etc. Then the first line of every function would also be to convert the arguments to this representation.

Anyway, I'm very open for suggestions because I'm not sure what's the cleanest way...

mschauer · 2023-07-10T11:37:58Z

prefer to also cast X and Y in list_dseps

I am fine with that

mwien · 2023-07-10T13:24:34Z

I think I'm reasonably happy with the code for now :)

Having problems right now to build the docs locally (haven't done that before), so I wasn't able to check how the formatting looks yet.

But if I fix that, I think we could merge soon

mschauer · 2023-07-10T15:20:18Z

Can I just merge? Then we can see the docs.

mwien · 2023-07-10T15:20:52Z

Go for it!

mwien · 2023-07-10T15:40:12Z

Awesome :)

Having problems right now to build the docs locally (haven't done that before), so I wasn't able to check how the formatting looks yet.

I didn't realize docs/src/library.md needs to be modified...
well, that's why I didn't see anything 😅

Currently not on the PC, could add this later (and try locally first)

mwien · 2023-07-10T17:13:56Z

I figured out how to work on the docs locally now. Opened a new PR for updating the docs: #85

We can also work on a new Example section there (or in a separate PR if we want a quick merge)...

mschauer · 2023-07-11T09:54:50Z

Out of curiosity: is all functionality canonical (e.g. available in DAGitty) or did you sneak in your own discoveries and innovations in here?

mwien · 2023-07-11T12:45:00Z

In principle all the functionality is in DAGitty as well. The front-door adjustment algorithm is very new, but it has also been added to the experimental version of Dagitty (but I think no other package has it yet).

Still, I wrote everything from scratch. The main thing about my implementation is that it uses gensearch throughout (I also modified and simplified this a bit to my liking). In DAGitty there is a similar function, but for historical reasons I think it's not used everywhere. Credit goes to my colleague Benito van der Zander, who (I think) was the one that came up with the idea and convinced me that it's the cleanest way to proceed.

I think it really makes the code much more maintainable and I was able to code all the functions in a few afternoons.

Because I have a slightly different version of gensearch, I had to modify some minor things.

Then, there is this thing with the back-door adjustment for sets (in Dagitty there is no function for the classic back-door criterion afaik, only the adjustment criterion), where the "standard" generalization to sets (e.g. given by Pearl) is imo overly restrictive (tbh I don't really understand why it's stated there this way).

I think the way I implemented the listing algorithms with iterators is pretty neat. Every other package generates a vector of all (adjustment) sets, which might use a lot of memory or not terminate.

mwien and others added 16 commits January 25, 2023 12:50

initial commit identification algorithms

69fb4c8

start adding gensearch

73003f5

Merge branch 'mschauer:master' into idalgorithms

054194e

add gensearch and first algorithms

536625d

structure gensearch file

e957bf3

added min stuff plus frontdoor

0969229

added listing algorithms for adjustment sets

679b015

started testing

6ae0a86

introduce veto function

304cb95

code golfing

e006d33

further testing and polishing

32229da

started with testing

d0ac2b4

testing

dfbbb7e

first draft done

4e558e9

minor changes

cccef28

work on documentation

ce3284f

mschauer commented Jun 15, 2023

View reviewed changes

src/gensearch.jl Outdated Show resolved Hide resolved

mschauer commented Jun 15, 2023

View reviewed changes

src/gensearch.jl Outdated Show resolved Hide resolved

mwien added 4 commits June 15, 2023 20:58

fix comment

20d1b9a

change strings to symbols

d1851ae

implement simple version of Chickering's dag-to-cpdag

e7e0a98

fix error in listing algorithms e.g. for backdoor and frontdoor

8312c0b

mschauer commented Jun 25, 2023

View reviewed changes

src/cpdag.jl Outdated Show resolved Hide resolved

mwien and others added 3 commits June 25, 2023 12:39

typo

e8fdc0b

new topological sort

e5ee3b0

add summary for run-time discussion on cpdag(g)

c843d8f

mschauer mentioned this pull request Jul 6, 2023

Create CITATION.cff #84

Merged

Marcel Wienöbst added 2 commits July 6, 2023 12:07

start adding second test set

e82caa4

second test set done

3f9b6b5

make functions wallable with ints and vectors

e4c33a8

mschauer commented Jul 10, 2023

View reviewed changes

src/gensearch.jl Outdated Show resolved Hide resolved

mschauer commented Jul 10, 2023

View reviewed changes

src/gensearch.jl Outdated Show resolved Hide resolved

mschauer commented Jul 10, 2023

View reviewed changes

polish code

f5d84dd

polishing docs

a47e6ca

mschauer merged commit fce71ee into mschauer:master Jul 10, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identification algorithms #82

Identification algorithms #82

mschauer commented Jun 14, 2023 •

edited

Loading

mwien commented Jun 14, 2023

mschauer Jun 15, 2023

mwien Jun 15, 2023 •

edited

Loading

mschauer Jun 15, 2023

mschauer Jul 10, 2023

mschauer commented Jul 3, 2023

mschauer commented Jul 3, 2023

mwien commented Jul 4, 2023 •

edited

Loading

mschauer commented Jul 6, 2023

mschauer commented Jul 6, 2023 •

edited

Loading

mwien commented Jul 6, 2023 •

edited

Loading

mschauer commented Jul 6, 2023

mschauer commented Jul 6, 2023 •

edited

Loading

mwien commented Jul 6, 2023 •

edited

Loading

mschauer commented Jul 7, 2023

mwien commented Jul 7, 2023

mschauer Jul 10, 2023

mwien Jul 10, 2023

mschauer commented Jul 10, 2023

mwien commented Jul 10, 2023

mwien commented Jul 10, 2023

mschauer commented Jul 10, 2023

mwien commented Jul 10, 2023

mschauer commented Jul 10, 2023

mwien commented Jul 10, 2023

mwien commented Jul 10, 2023

mwien commented Jul 10, 2023

mschauer commented Jul 11, 2023

mwien commented Jul 11, 2023

	return Set([x for x in 1:nv(g) if visited[3x-2] \|\| visited[3x-1] \|\| visited[3*x]])
	return Set(x for x in 1:nv(g) if visited[3x-2] \|\| visited[3x-1] \|\| visited[3*x])

Identification algorithms #82

Identification algorithms #82

Conversation

mschauer commented Jun 14, 2023 • edited Loading

mwien commented Jun 14, 2023

mschauer Jun 15, 2023

Choose a reason for hiding this comment

mwien Jun 15, 2023 • edited Loading

Choose a reason for hiding this comment

mschauer Jun 15, 2023

Choose a reason for hiding this comment

mschauer Jul 10, 2023

Choose a reason for hiding this comment

mschauer commented Jul 3, 2023

mschauer commented Jul 3, 2023

mwien commented Jul 4, 2023 • edited Loading

mschauer commented Jul 6, 2023

mschauer commented Jul 6, 2023 • edited Loading

mwien commented Jul 6, 2023 • edited Loading

mschauer commented Jul 6, 2023

mschauer commented Jul 6, 2023 • edited Loading

mwien commented Jul 6, 2023 • edited Loading

mschauer commented Jul 7, 2023

mwien commented Jul 7, 2023

mschauer Jul 10, 2023

Choose a reason for hiding this comment

mwien Jul 10, 2023

Choose a reason for hiding this comment

mschauer commented Jul 10, 2023

mwien commented Jul 10, 2023

mwien commented Jul 10, 2023

mschauer commented Jul 10, 2023

mwien commented Jul 10, 2023

mschauer commented Jul 10, 2023

mwien commented Jul 10, 2023

mwien commented Jul 10, 2023

mwien commented Jul 10, 2023

mschauer commented Jul 11, 2023

mwien commented Jul 11, 2023

mschauer commented Jun 14, 2023 •

edited

Loading

mwien Jun 15, 2023 •

edited

Loading

mwien commented Jul 4, 2023 •

edited

Loading

mschauer commented Jul 6, 2023 •

edited

Loading

mwien commented Jul 6, 2023 •

edited

Loading

mschauer commented Jul 6, 2023 •

edited

Loading

mwien commented Jul 6, 2023 •

edited

Loading