Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identification algorithms #82

Merged
merged 31 commits into from
Jul 10, 2023
Merged

Identification algorithms #82

merged 31 commits into from
Jul 10, 2023

Conversation

mschauer
Copy link
Owner

@mschauer mschauer commented Jun 14, 2023

@mwien I made a PR to be able to comment and assist.

Closes #61

@mwien
Copy link
Collaborator

mwien commented Jun 14, 2023

Thanks, much appreciated :)
Feedback is very welcome!

src/gensearch.jl Outdated
end
end
foreach(s -> genvisit(g, s, INIT), S)
return Set([x for x in 1:nv(g) if visited[3*x-2] || visited[3*x-1] || visited[3*x]])
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line called often?

Copy link
Collaborator

@mwien mwien Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is called once per call of gensearch.

gensearch returns the set of all visited vertices. Imho this is the most general/flexible way to do it.

In the context of d-separation, one call of gensearch with S = X (and the right pass function) will return the set of all vertices d-connected with X. To test whether Y is d-separated from X one could then check whether the intersection of the returned set and Y is empty.

Alternatives would be:
(i) to return a boolean vector of visited vertices (less overhead, but also not so convenient to work with)
(ii) to write a second gensearch function which tests reachability (e.g., in the d-separation example it would directly return a boolean). This could be used in cases where the whole set of visited vertices is not needed.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think it’s fine

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Set([x for x in 1:nv(g) if visited[3*x-2] || visited[3*x-1] || visited[3*x]])
return Set(x for x in 1:nv(g) if visited[3*x-2] || visited[3*x-1] || visited[3*x])

would make sense.

src/gensearch.jl Outdated Show resolved Hide resolved
src/gensearch.jl Outdated Show resolved Hide resolved
src/cpdag.jl Outdated Show resolved Hide resolved
@mschauer
Copy link
Owner Author

mschauer commented Jul 3, 2023

What is the status?

@mschauer
Copy link
Owner Author

mschauer commented Jul 3, 2023

We might even wait for JuliaGraphs/Graphs.jl#266

@mwien
Copy link
Collaborator

mwien commented Jul 4, 2023

We now have the functions find, find_min and list for the adjustment criterion and for the back-door criterion.

The generalization of back-door to sets $X$ and $Y$ originally given by Pearl is imo a bit restrictive (the backdoor criterion has to hold for all pairs $x \in X$ and $y \in Y$) and hard for writing an efficient algorithm, so I did this a bit differently and more naturally (the set $Z$ satisfies the back-door criterion if (i) it contains no descendants of $X$ and (ii) it holds $(X \text{ indep } Y | Z)$ in $G$ with outgoing edges of the set $X$ removed).

Generally, I would be in favor of only having the adjustment criterion functions. Dagitty also does it like this. The criterion is sound and complete for adjustment, is well-studied from the algorithmic side and in any case the differences are minor.

E.g. if $X$ and $Y$ are singleton, then a graph has a set satisfying the adjustment criterion iff it has a set satisfying the back-door criterion. Moreover, in this case the minimal and minimum size sets are identical.

Finally, we might also want to add stuff on efficient adjustment sets (those with smallest asymptotic variance) and as far as I know those base on the adjustment criterion as well.

Having both functions could also be confusing.

@mschauer
Copy link
Owner Author

mschauer commented Jul 6, 2023

We you can either kick them out or add a sentence to the doc string “Provided for comparison only”

@mschauer
Copy link
Owner Author

mschauer commented Jul 6, 2023

For the docs, we might change https://mschauer.github.io/CausalInference.jl/latest/examples/backdoor_example/ to the new tools. By the way, is it correct that there Set([3]) is an adjustment set?

@mwien
Copy link
Collaborator

mwien commented Jul 6, 2023

For the docs, we might change https://mschauer.github.io/CausalInference.jl/latest/examples/backdoor_example/ to the new tools.

Sounds good. If we explain the difference between back-door criterion and adjustment criterion in the docs I'm also fine with having implementations for both. Might actually be a good idea to explicitly discuss the differences instead of just give a function for one of them.

By the way, is it correct that there Set([3]) is an adjustment set?

No, that does not seem correct :/ Which function outputs this?
I get

julia> dag = digraph([
           1 => 3
           3 => 6
           2 => 5
           5 => 8
           6 => 7
           7 => 8
           1 => 4
           2 => 4
           4 => 6
           4 => 8])
{8, 10} directed simple Int64 graph

julia> for s in list_covariate_adjustment(dag, Set(6), Set(8))
           println(s)
       end
Set([4, 1])
Set([4, 3])
Set([4, 3, 1])
Set([4, 2])
Set([4, 2, 1])
Set([4, 2, 3])
Set([4, 2, 3, 1])
Set([5, 4])
Set([5, 4, 1])
Set([5, 4, 3])
Set([5, 4, 3, 1])
Set([5, 4, 2])
Set([5, 4, 2, 1])
Set([5, 4, 2, 3])
Set([5, 4, 2, 3, 1])

And that appears to be fine...

@mschauer
Copy link
Owner Author

mschauer commented Jul 6, 2023

Sorry for the noise, I looked at nodes 4 and 6 in error.

@mschauer
Copy link
Owner Author

mschauer commented Jul 6, 2023

This is really cool and quite useable

dag = digraph([1 => 3, 3 => 6, 2 => 5, 5 => 8, 6 => 7, 7 => 8, 1 => 4, 2 => 4, 4 => 6, 4 => 8])
println.(list_covariate_adjustment(dag, Set([6]), Set([8]), Set(Int[]), setdiff(Set(1:8), [1,2])));
Set([4, 3])
Set([5, 4])
Set([5, 4, 3])

PS: As test:

g = digraph([1 => 3, 3 => 6, 2 => 5, 5 => 8, 6 => 7, 7 => 8, 1 => 4, 2 => 4, 4 => 6, 4 => 8])
@test Set(list_covariate_adjustment(g, Set([6]), Set([8]), Set(Int[]), setdiff(Set(1:8), [1,2,6,8]))) == Set([Set([3,4]), Set([4,5]), Set([3,4,5])])
# or if this also should work
@test Set(list_covariate_adjustment(g, Set([6]), Set([8]), Set(Int[]), setdiff(Set(1:8), [1,2]))) == Set([Set([3,4]), Set([4,5]), Set([3,4,5])])

@mschauer mschauer mentioned this pull request Jul 6, 2023
@mwien
Copy link
Collaborator

mwien commented Jul 6, 2023

Next TODOs are:

  • making it easier to call some of the functions (e.g. that they can be also called with Ints or Vectors instead of only Sets) -> ✅
  • better documentation of the functions
  • polishing parts of the code ✅
  • adapt and extend docs section "Reasoning about experiments" -> separate PR

Hope I find some time tomorrow and next week for it

@mschauer
Copy link
Owner Author

mschauer commented Jul 7, 2023

I agree with the TODOs. The "Reasoning about experiments" can be done in a separate PR if we want to be a bit more incremental.

@mwien
Copy link
Collaborator

mwien commented Jul 7, 2023

I agree with the TODOs. The "Reasoning about experiments" can be done in a separate PR if we want to be a bit more incremental.

Agreed, that's better

src/gensearch.jl Outdated Show resolved Hide resolved
src/gensearch.jl Outdated Show resolved Hide resolved
List all d-separators `Z` with `I subseteq Z subseteq R` for sets of vertices `X` and `Y` in `g`.
"""
function list_dseps(g, X, Y, I = Set{eltype(g)}(), R = setdiff(Set(vertices(g)), X, Y))
X, Y, I, R = toset.((X, Y, I, R))
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. the cast on X and Y is not necessary?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean for list functions specifically or also in other places?

Here find_dsep would cast X at every call (which happens during list_dseps``) if we don't do the cast here. It's true that find_dsepalso doesn't really need X to be a Set, however, e.g.find_covariate_adjustment``` and other functions do (because membership v in X is tested repeatedly), so at that point I thought it was cleanest to cast everything.

@mschauer
Copy link
Owner Author

Have you seen https://docs.julialang.org/en/v1/manual/style-guide/#Handle-excess-argument-diversity-in-the-caller ? I think it doesn't really apply here but in principle widening


 struct ConstraintIterator{T<:Integer, S, U<:AbstractSet{T}, F<:Function}
     g::SimpleDiGraph{T}
     X::S
     Y::S
     I::U
     R::U
     find::F
 end

and trusting the iteration to generate errors for unsupported types should work. I think here it doesn't matter at all that X and Y are of a certain type as long as the methods are defined we need in the iterator.

@mwien
Copy link
Collaborator

mwien commented Jul 10, 2023

Have you seen https://docs.julialang.org/en/v1/manual/style-guide/#Handle-excess-argument-diversity-in-the-caller ? I think it doesn't really apply here but in principle widening


 struct ConstraintIterator{T<:Integer, S, U<:AbstractSet{T}, F<:Function}
     g::SimpleDiGraph{T}
     X::S
     Y::S
     I::U
     R::U
     find::F
 end

and trusting the iteration to generate errors for unsupported types should work. I think here it doesn't matter at all that X and Y are of a certain type as long as the methods are defined we need in the iterator.

Agreed, that's better and more general.

I would still prefer to also cast X and Y in list_dseps etc, because it is more efficient for our use case (if we wouldn't cast it there it would get cast for every call of find = find_dsep).

@mwien
Copy link
Collaborator

mwien commented Jul 10, 2023

Other solution (instead of having a cast at the beginning of every function) would be to add X::Set{eltype(g)} etc to every function. Then, have a wrapper function which accepts other types, does a cast to Set and calls the other function.

Generally I think casting isn't so bad here, because the Sets are basically our internal representation for efficiency. We could have something else than Sets, like a special kind of bitvector representation, which supports fast membership queries and union/intersect etc. Then the first line of every function would also be to convert the arguments to this representation.

Anyway, I'm very open for suggestions because I'm not sure what's the cleanest way...

@mschauer
Copy link
Owner Author

prefer to also cast X and Y in list_dseps

I am fine with that

@mwien
Copy link
Collaborator

mwien commented Jul 10, 2023

I think I'm reasonably happy with the code for now :)

Having problems right now to build the docs locally (haven't done that before), so I wasn't able to check how the formatting looks yet.

But if I fix that, I think we could merge soon

@mschauer
Copy link
Owner Author

Can I just merge? Then we can see the docs.

@mwien
Copy link
Collaborator

mwien commented Jul 10, 2023

Go for it!

@mschauer mschauer merged commit fce71ee into mschauer:master Jul 10, 2023
4 checks passed
@mwien
Copy link
Collaborator

mwien commented Jul 10, 2023

Awesome :)

Having problems right now to build the docs locally (haven't done that before), so I wasn't able to check how the formatting looks yet.

I didn't realize docs/src/library.md needs to be modified...
well, that's why I didn't see anything 😅

Currently not on the PC, could add this later (and try locally first)

@mwien
Copy link
Collaborator

mwien commented Jul 10, 2023

I figured out how to work on the docs locally now. Opened a new PR for updating the docs: #85

We can also work on a new Example section there (or in a separate PR if we want a quick merge)...

@mschauer
Copy link
Owner Author

Out of curiosity: is all functionality canonical (e.g. available in DAGitty) or did you sneak in your own discoveries and innovations in here?

@mwien
Copy link
Collaborator

mwien commented Jul 11, 2023

In principle all the functionality is in DAGitty as well. The front-door adjustment algorithm is very new, but it has also been added to the experimental version of Dagitty (but I think no other package has it yet).

Still, I wrote everything from scratch. The main thing about my implementation is that it uses gensearch throughout (I also modified and simplified this a bit to my liking). In DAGitty there is a similar function, but for historical reasons I think it's not used everywhere. Credit goes to my colleague Benito van der Zander, who (I think) was the one that came up with the idea and convinced me that it's the cleanest way to proceed.

I think it really makes the code much more maintainable and I was able to code all the functions in a few afternoons.

Because I have a slightly different version of gensearch, I had to modify some minor things.

Then, there is this thing with the back-door adjustment for sets (in Dagitty there is no function for the classic back-door criterion afaik, only the adjustment criterion), where the "standard" generalization to sets (e.g. given by Pearl) is imo overly restrictive (tbh I don't really understand why it's stated there this way).

I think the way I implemented the listing algorithms with iterators is pretty neat. Every other package generates a vector of all (adjustment) sets, which might use a lot of memory or not terminate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement search for adjustment sets
3 participants