Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group together related commands in the graph visualization #229

Closed
krlmlr opened this issue Feb 4, 2018 · 24 comments
Closed

Group together related commands in the graph visualization #229

krlmlr opened this issue Feb 4, 2018 · 24 comments

Comments

@krlmlr
Copy link
Collaborator

krlmlr commented Feb 4, 2018

Can visNetwork visually group related commands (manually specified by the user) in a subgraph-like setting?

From https://graphviz.org:

screenshot from 2018-02-04 07-44-16

@krlmlr krlmlr changed the title Related commands Related commands in the graph visualization Feb 4, 2018
@wlandau
Copy link
Member

wlandau commented Feb 14, 2018

How should we choose the groupings? My intuition tells me that graph theory has a straightforward answer somewhere.

@wlandau
Copy link
Member

wlandau commented Feb 19, 2018

#233 might preserve the patterns that expanded commands came from, which would help with grouping related commands here.

@wlandau wlandau changed the title Related commands in the graph visualization Group together related commands in the graph visualization Feb 21, 2018
@wlandau
Copy link
Member

wlandau commented Feb 23, 2018

Could features like this one bud into their own drake-focused visualization package? I believe drake should natively support basic network visualizations, but the possibilities are endless, and the code base will likely be long, complicated and difficult to test.

@krlmlr
Copy link
Collaborator Author

krlmlr commented Feb 23, 2018

I was looking only for manual grouping, perhaps with a new column in the plan data frame?

@wlandau
Copy link
Member

wlandau commented Feb 23, 2018

That sounds much easier.

@wlandau
Copy link
Member

wlandau commented Feb 23, 2018

On the other hand, I have tried and failed to micromanage the vertical ordering of the nodes. Maybe it's because of the directed/leveled positioning and default Sugiyama igraph layout in render_drake_graph(), but from what I recall from early development, I actually doubt this feature will turn out well as long as we are using visNetwork. Here is where I think ggraph could help us. I'm not exactly how exactly sure about the implementation, but it should be straightforward with the output of dataframes_graph(). With ggraph, we lose interactivity, but there is a lot to gain in return.

@krlmlr
Copy link
Collaborator Author

krlmlr commented Feb 23, 2018

I saw that vis.js can do clustering, but I'm not sure if it helps. How does the ggraph output for the basic example look like?

@wlandau
Copy link
Member

wlandau commented Feb 23, 2018

Not sure yet, but eager to finally try out ggraph!

@wlandau
Copy link
Member

wlandau commented Feb 26, 2018

It looks like ggraph may not have clustering, but I will search harder. visNetwork has visClusteringByGroup, though I am having trouble making more than one cluster at a time.

library(drake)
library(visNetwork)
con <- load_basic_example()
df <- dataframes_graph(con)
df$nodes
df$nodes$group <- paste0(df$nodes$status, "_", df$nodes$type)
g <- render_drake_graph(df)
visGroups(g, groupname = "imported_function") %>%
  visGroups(groupname = "outdated_object") %>%
  visClusteringByGroup(
    groups = c("imported_function", "outdated_object"))

capture

@krlmlr
Copy link
Collaborator Author

krlmlr commented Feb 28, 2018

Works for me with a variant of the ?visGroups example from visNetwork:

library(visNetwork)
nodes <- data.frame(id = 1:10, label = paste("Label", 1:10), 
 group = sample(c("A", "B"), 10, replace = TRUE))
 edges <- data.frame(from = c(2,5,10), to = c(1,2,10))

visNetwork(nodes, edges) %>%
 visLegend() %>%
 visGroups(groupname = "A", color = "red", shape = "database") %>%
 visGroups(groupname = "B", color = "yellow", shape = "triangle") %>%
 visClusteringByGroup(c("A", "B"))

@wlandau
Copy link
Member

wlandau commented Feb 28, 2018

Thanks, Kirill! Is this the kind of clustering you were imagining? Do you think it would be enough to list all the target names in the cluster, maybe with the label argument of visClusteringByGroup()?

@krlmlr
Copy link
Collaborator Author

krlmlr commented Feb 28, 2018

For a collapsed cluster I'd rather only see its label and not the detailed target names in the cluster. I haven't thought about clustering in interactive visualizations, but this does look useful. For graphviz-based renderers we can use a similar logic for specifying the groups, even if the display will look different (see first post).

@AlexAxthelm
Copy link
Collaborator

As a side note, to consider (after seeing the unconf thread) it might be worth searching the global environment for anything that looks like a drake plan (tibble with correct colnames would be a good start), and use that as a rough basis for clustering. I know that I usually have something along the lines of

data_plan <- drake_plan({importing data})
cleaning_plan <- drake_plan({cleaning functions})
analysis_plan <- drake_plan({analysis_functions})
reporting_plan({reporting functions})
master_plan <- bind_rows(data_plan, cleaning_plan, analysis_plan, reporting_plan)
make(master_plan)

If I could see a simplified graph with 4 target-ish objects, so that I can tell easily how long importing takes, or where in the plan the make failed, I would be happy. Maybe expanding collapsing sub-plans that didn't get touched yet, or that made successfully? This could be a non-default behavior, but if I have 1000+ targets, which all are at least similar, It would be nice (and improve render time), if I didn't have to see then all.

@wlandau
Copy link
Member

wlandau commented Mar 12, 2018

I like the general idea. Subplans define the natural clustering that I see most people using. People typically combine their plans with bind_rows() or similar.

bind_rows(data_plan, cleaning_plan, analysis_plan, reporting_plan)

What about bind_plans()?

master_plan <- bind_plans(
  data = data_plan,
  cleaning = cleaning_plan,
  analysis = analysis_plan,
  reporting = reporting_plan
)

bind_plans() would add an extra cluster or subplan column, where the names of the clusters would respect the argument names you provide. Eventually, we could even designate different future resource types to different subplans (re: #169).

I hesitate to search the user's environment for (sub)plans because it seems a bit mysterious.

@AlexAxthelm
Copy link
Collaborator

I like the idea to explicitly label subplans. Having that be a separate column would also open the door to multiple levels of grouping.

@rkrug
Copy link
Contributor

rkrug commented Mar 12, 2018

Also, when using bind_plans() one could add the name of the subplan as a prefix into the target name which would make it much easier to deal with duplicate target names in different sub-plans. This could even be disabled via an argument if not wished.

@AlexAxthelm
Copy link
Collaborator

To expand on my idea from above, now that I'm in front of a real keyboard, the idea would be to have multiple levels of grouping, along the lines of:

plan = tribble(
  ~target,   ~command,        ~group,
  "x",       "seq(1, 10)",    "import",
  "y",       "seq(10, 1)",    "import",
  "x_clean", "as.numeric(x)", "cleaning",
  "y_clean", "as.numeric(y)", "cleaning",
  "z",       "y + 10",        "analysis",
  "y_lm",    "lm(x ~ y)",     c("analysis", "linear"),
  "z_lm",    "lm(x ~ z)",     c("analysis", "linear"),
  "y_glm",   "glm(x ~ y)",     c("analysis", "general"),
  "z_glm",   "glm(x ~ z)",     c("analysis", "general")
) %>% print()
# A tibble: 9 x 3
#  target  command       group    
#  <chr>   <chr>         <list>   
#1 x       seq(1, 10)    <chr [1]>
#2 y       seq(10, 1)    <chr [1]>
#3 x_clean as.numeric(x) <chr [1]>
#4 y_clean as.numeric(y) <chr [1]>
#5 z       y + 10        <chr [1]>
#6 y_lm    lm(x ~ y)     <chr [2]>
#7 z_lm    lm(x ~ z)     <chr [2]>
#8 y_glm   glm(x ~ y)    <chr [2]>
#9 z_glm   glm(x ~ z)    <chr [2]>

so that if, for example, z_glm failed to build, the build graph would show the "import", "cleaning", and "linear" groups as groups, but expand the "general" group, so that I could see the failed object.

The underlying assumption here is that plans contain targets that act similarly, so if I have many similar objects, I don't need to see the details about them unless something is wrong.

A loose sketch of what I'm thinking:
image

@wlandau
Copy link
Member

wlandau commented Mar 13, 2018

Great ideas, Alex! It seems like we could implement them drake itself even before #282 is implemented. If we do it cleanly, not much in dataframes_graph() or vis_drake_graph() would need to change. We could just take the clusters from the group.

Permitting multiple groups (for example, c("analysis", "general") for z_glm) is the most complicated thing. Off the top of my head, I don't know if it makes sense for a pre-#282 implementation. I wonder if visNetwork supports clusters within clusters...

@AlexAxthelm
Copy link
Collaborator

It appears that clustering in visNetwork is still experimental. http://datastorm-open.github.io/visNetwork/more.html

I think trying the one-level clustering would be a good first step. My machine won’t boot right now, or I would play around with it myself.

@wlandau
Copy link
Member

wlandau commented Mar 13, 2018

Sure, that sounds like a good plan for base drake. We can allow multiple groups in bind_plans() and then use the first group listed for each target. Separate tools can extend this to account for multiple groups.

@wlandau
Copy link
Member

wlandau commented Mar 13, 2018

Re: ropensci/unconf18#12 (comment), clusters are related to expansions and subplans in the DSL. cc @dapperjapper.

@wlandau
Copy link
Member

wlandau commented Jun 30, 2018

I plan to start work on this in a new drakevis package once I have time to work on it in earnest.

@wlandau
Copy link
Member

wlandau commented Jul 6, 2018

The cleanest solution I know falls right out of #376 (comment). Keeping wildcard information after expansion/evaluation seems massively useful for #229 (comment).

@wlandau
Copy link
Member

wlandau commented Jul 6, 2018

6edf816 exposes all columns from the plan in drake_graph_info()$nodes, which gives us flexibility: clusters can be subplans, wildcards, etc. visNetwork clustering may not work out (datastorm-open/visNetwork#254) but manual clustering should be straightforward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants