Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotate H5 clades through a node data JSON file instead of modifying metadata #25

Open
huddlej opened this issue May 2, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@huddlej
Copy link
Contributor

huddlej commented May 2, 2024

Context

In conversation about #22, @trvrb noted:

I don't think this is part of the scope of this PR, but it would seem cleaner to me for this clade-labeling/add-clades.py script to instead just create a node data JSON with h5_label_clade rather than messing with the metadata file.

@lmoncla and I just had some confusion from different rules (refine, traits) asking for the metadata TSV vs the metadata-with-clade TSV. Though the function metadata_by_wildcards mostly solves this issue.

@jameshadfield noted that:

The only reason I can see to not do this is if we use this data in the filtering step. But we don't.

Description

We should modify scripts/add-clades.py to create a node data JSON file as output and update the workflow to make the resulting output an input to the export rule instead of a step that modifies the metadata.

@huddlej
Copy link
Contributor Author

huddlej commented Dec 4, 2024

Following up from a question by Pauline Trinh about this issue, we would probably copy the pattern used in this script from the seasonal flu workflow to generate a node data JSON file with the clade labels.

@jameshadfield
Copy link
Member

jameshadfield commented Dec 17, 2024

we would probably copy the pattern used in this script from the seasonal flu workflow to generate a node data JSON file with the clade labels

Yup, node-data JSONs are more powerful in that they allow annotating internal nodes and labelling branches. Despite being called h5_label_clade I'm not sure of the big-picture intentions here regarding internal node colourings and branch labels. cc @lmoncla

I'd still recommend starting with a PR which kept the current functionality but made the snakemake workflow simpler to reason with, and then add internal nodes / branch labels in subsequent work, as desired.

@lmoncla
Copy link
Collaborator

lmoncla commented Dec 17, 2024

@jameshadfield can we hold off on this for now? We have nextclade working, but the historic, all classes assignments are not absolutely stellar so I want to retain annotations with LABEL for now. Happy to expand on this more, but I would prefer to keep as is for now!

@jameshadfield
Copy link
Member

@jameshadfield can we hold off on this for now?

Sure thing. I'm not proposing / planning to do the work, I was just trying to caution others about changing our current functionality to add in internal nodes / branch labels for LABEL annotations without touching base with you first.

P.S. the original aim,

We should modify scripts/add-clades.py to create a node data JSON file as output and update the workflow to make the resulting output an input to the export rule instead of a step that modifies the metadata.

Wouldn't change at all how you actually run LABEL, it'd just make the phylogenetic snakemake workflow easier to reason with by using an intermediate node-data JSON rather than adding a column to the main metadata TSV. If we still want that capability in the Snakemake workflow long term then we should still do this one day. Let's revisit once you've got the Nexclade stuff working.

P.P.S. please reach out if you want help incorporating the Nextclade outputs into the phylogenetic workflows!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants