Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add automatic detection of datasets #1244

Merged
merged 56 commits into from
Sep 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
a224aaf
feat(cli): scaffold `seq sort` cli command, input parsing and main loop
ivan-aksamentov Sep 1, 2023
16cf181
feat(cli): implement ref search
ivan-aksamentov Sep 1, 2023
0cf48f9
feat(cli): allow external minimizer index JSON file
ivan-aksamentov Sep 1, 2023
dfa63a6
feat(cli): improve error message text
ivan-aksamentov Sep 1, 2023
d327c77
fix(cli): version warning false positive
ivan-aksamentov Sep 1, 2023
8940bbb
feat(cli): add cli params for minimizer search algo
ivan-aksamentov Sep 1, 2023
b6b4fe8
Merge remote-tracking branch 'origin/master' into feat/ref-minimizer
ivan-aksamentov Sep 4, 2023
5b6bc87
fix(cli): allow minimizer index to be absent
ivan-aksamentov Sep 4, 2023
1b90f1b
feat(cli): write sorted fasta files
ivan-aksamentov Sep 4, 2023
db37dd3
fix(web): dataset equality
ivan-aksamentov Sep 4, 2023
8ac3f67
feat(web): add sequence sorting prototype
ivan-aksamentov Sep 5, 2023
bb5aa6a
fix: move wasm module into webworker
ivan-aksamentov Sep 5, 2023
d60ced5
fix(web): prioritize dataset-server URL param
ivan-aksamentov Sep 5, 2023
9b31210
fix(web): parentheses in the conditional
ivan-aksamentov Sep 5, 2023
603ffa0
feat(web): only show "Autodetect" entry when minimizer data is available
ivan-aksamentov Sep 5, 2023
23cb227
feat(web): improve message in GitHub URL errors
ivan-aksamentov Sep 5, 2023
7c9e5cc
fix(web): non-unique atom key
ivan-aksamentov Sep 5, 2023
8ec4530
refactor: lint
ivan-aksamentov Sep 6, 2023
32f6ee5
feat(web): display multiple relevant auto-detected refs
ivan-aksamentov Sep 6, 2023
5bcc43f
Merge remote-tracking branch 'origin/master' into feat/ref-minimizer
ivan-aksamentov Sep 7, 2023
816e727
feat(web): prototype new dataset autodetect ui
ivan-aksamentov Sep 7, 2023
638da57
feat(web): animate list updates to prevent flickering
ivan-aksamentov Sep 7, 2023
94578f4
perf: serialize directly to jsvalue to avoid json parsing
ivan-aksamentov Sep 7, 2023
2a97561
perf: batch updates from wasm to reduce main thread blocking
ivan-aksamentov Sep 7, 2023
c52e8d8
fix(web): duplicate id and label misalignment
ivan-aksamentov Sep 7, 2023
7c2a305
feat: count detections for all datasets, not just the top one
ivan-aksamentov Sep 7, 2023
ac91ecc
refactor: extract function
ivan-aksamentov Sep 8, 2023
af4841e
refactor: lint
ivan-aksamentov Sep 8, 2023
d136b84
refactor(cli): simplify check for completions cli args
ivan-aksamentov Sep 8, 2023
e33c04f
feat(cli): rename 'seq sort' subcommand to 'sort'
ivan-aksamentov Sep 8, 2023
bfa0e9b
refactor: remove unused import
ivan-aksamentov Sep 8, 2023
1a7c48e
Merge remote-tracking branch 'origin/master' into feat/ref-minimizer
ivan-aksamentov Sep 8, 2023
e30adc0
feat(cli): write sorted sequences to all common path prefixes
ivan-aksamentov Sep 8, 2023
619aab4
Merge remote-tracking branch 'origin/master' into feat/ref-minimizer
ivan-aksamentov Sep 8, 2023
ac7c2d5
feat(cli): prettify help
ivan-aksamentov Sep 8, 2023
61eef71
feat(cli): print all dataset detections in terminal output
ivan-aksamentov Sep 8, 2023
118f36b
Merge remote-tracking branch 'origin/master' into feat/ref-minimizer
ivan-aksamentov Sep 12, 2023
eb15094
feat: filter datasets in wasm to speedup web ui
ivan-aksamentov Sep 12, 2023
c4d2763
feat: remove list animations to speed things up
ivan-aksamentov Sep 12, 2023
5dfb741
fix: styled-components warning about updates being too frequent
ivan-aksamentov Sep 12, 2023
07526b8
feat: make batch flushes more frequent for faster visible response
ivan-aksamentov Sep 12, 2023
79a44fc
fix: typing
ivan-aksamentov Sep 12, 2023
f17cf32
fix: sorting of dataset list
ivan-aksamentov Sep 12, 2023
407921c
feat(cli): add dataset statistics into sort command printout
ivan-aksamentov Sep 12, 2023
a81bc88
feat(cli): print results of sort command only in verbose mode
ivan-aksamentov Sep 12, 2023
d4efb62
feat(cli): add output tsv file for sort command
ivan-aksamentov Sep 12, 2023
99fb0c8
fix(cli): add undetected entries into sort tsv
ivan-aksamentov Sep 12, 2023
131fedd
feat(web): reimplement main page
ivan-aksamentov Sep 14, 2023
cabb824
feat(web): prettify nav bar
ivan-aksamentov Sep 15, 2023
55093f9
fix(web): small styling
ivan-aksamentov Sep 15, 2023
68438ff
fix(web): use unique files ids as react keys
ivan-aksamentov Sep 15, 2023
01f3449
feat(web): scroll current dataset list item into view
ivan-aksamentov Sep 18, 2023
453cf3b
refactor: simplify list scrolling setup
ivan-aksamentov Sep 18, 2023
f0e0cdf
feat(web): preselect top dataset after suggestion is complete
ivan-aksamentov Sep 18, 2023
a002c12
refactor: lint
ivan-aksamentov Sep 18, 2023
996980d
fix(web): text of example entry in query sequence list
ivan-aksamentov Sep 18, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions packages_rs/nextclade-cli/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ lazy_static = "=1.4.0"
log = "=0.4.19"
nextclade = { path = "../nextclade" }
num_cpus = "=1.16.0"
ordered-float = { version = "=3.9.1", features = ["rand", "serde", "schemars"] }
owo-colors = "=3.5.0"
pretty_assertions = "=1.3.0"
rayon = "=1.7.0"
Expand All @@ -39,6 +40,7 @@ serde = { version = "=1.0.164", features = ["derive"] }
serde_json = { version = "=1.0.99", features = ["preserve_order", "indexmap", "unbounded_depth"] }
strum = "=0.25.0"
strum_macros = "=0.25"
tinytemplate = "=1.2.1"
url = { version = "=2.4.0", features = ["serde"] }
zip = { version = "=0.6.6", default-features = false, features = ["aes-crypto", "bzip2", "deflate", "time"] }

Expand Down
1 change: 1 addition & 0 deletions packages_rs/nextclade-cli/src/cli/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ pub mod nextclade_dataset_get;
pub mod nextclade_dataset_list;
pub mod nextclade_loop;
pub mod nextclade_ordered_writer;
pub mod nextclade_seq_sort;
pub mod verbosity;
88 changes: 85 additions & 3 deletions packages_rs/nextclade-cli/src/cli/nextclade_cli.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
use crate::cli::nextclade_dataset_get::nextclade_dataset_get;
use crate::cli::nextclade_dataset_list::nextclade_dataset_list;
use crate::cli::nextclade_loop::nextclade_run;
use crate::cli::nextclade_seq_sort::nextclade_seq_sort;
use crate::cli::verbosity::{Verbosity, WarnLevel};
use crate::io::http_client::ProxyConfig;
use clap::builder::styling;
Expand All @@ -12,6 +13,7 @@ use itertools::Itertools;
use lazy_static::lazy_static;
use nextclade::io::fs::add_extension;
use nextclade::run::params::NextcladeInputParamsOptional;
use nextclade::sort::params::NextcladeSeqSortParams;
use nextclade::utils::global_init::setup_logger;
use nextclade::{getenv, make_error};
use std::fmt::Debug;
Expand Down Expand Up @@ -76,15 +78,20 @@ pub enum NextcladeCommands {
shell: String,
},

/// Run alignment, mutation calling, clade assignment, quality checks and phylogenetic placement
/// Run sequence analysis: alignment, mutation calling, clade assignment, quality checks and phylogenetic placement
///
/// For short help type: `nextclade -h`, for extended help type: `nextclade --help`. Each subcommand has its own help, for example: `nextclade run --help`.
Run(Box<NextcladeRunArgs>),

/// List and download available Nextclade datasets
/// List and download available Nextclade datasets (pathogens)
///
/// For short help type: `nextclade -h`, for extended help type: `nextclade --help`. Each subcommand has its own help, for example: `nextclade run --help`.
/// For short help type: `nextclade -h`, for extended help type: `nextclade --help`. Each subcommand has its own help, for example: `nextclade dataset --help`.
Dataset(Box<NextcladeDatasetArgs>),

/// Sort sequences according to the inferred Nextclade dataset (pathogen)
///
/// For short help type: `nextclade -h`, for extended help type: `nextclade --help`. Each subcommand has its own help, for example: `nextclade sort --help`.
Sort(Box<NextcladeSortArgs>),
}

#[derive(Parser, Debug)]
Expand Down Expand Up @@ -621,6 +628,80 @@ pub struct NextcladeRunArgs {
pub other_params: NextcladeRunOtherParams,
}

#[allow(clippy::struct_excessive_bools)]
#[derive(Parser, Debug)]
#[clap(verbatim_doc_comment)]
pub struct NextcladeSortArgs {
/// Path to one or multiple FASTA files with input sequences
///
/// Supports the following compression formats: "gz", "bz2", "xz", "zst". If no files provided, the plain fasta input is read from standard input (stdin).
///
/// See: https://en.wikipedia.org/wiki/FASTA_format
#[clap(value_hint = ValueHint::FilePath)]
pub input_fastas: Vec<PathBuf>,

/// Path to input minimizer index JSON file.
///
/// By default the latest reference minimizer index is fetched from the dataset server (default or customized with `--server` argument). If this argument is provided, the algorithm skips fetching the default index and uses the index provided in the the JSON file.
///
/// Supports the following compression formats: "gz", "bz2", "xz", "zst". Use "-" to read uncompressed data from standard input (stdin).
#[clap(long, short = 'm')]
#[clap(value_hint = ValueHint::FilePath)]
pub input_minimizer_index_json: Option<PathBuf>,

/// Path to output directory
///
/// Sequences will be written in subdirectories: one subdirectory per dataset. Sequences inferred to be belonging to a particular dataset wil lbe places in the corresponding subdirectory. The subdirectory tree can be nested, depending on how dataset names are organized.
///
/// Mutually exclusive with `--output`.
///
#[clap(short = 'O', long)]
#[clap(value_hint = ValueHint::DirPath)]
#[clap(group = "outputs")]
pub output_dir: Option<PathBuf>,

/// Template string for the file path to output sorted sequences. A separate file will be generated per dataset.
///
/// The string should contain template variable `{name}`, where the dataset name will be substituted. Note that if the `{name}` variable contains slashes, they will be interpreted as path segments and subdirectories will be created.
///
/// Make sure you properly quote and/or escape the curly braces, so that your shell, programming language or pipeline manager does not attempt to substitute the variables.
///
/// Mutually exclusive with `--output-dir`.
///
/// If the provided file path ends with one of the supported extensions: "gz", "bz2", "xz", "zst", then the file will be written compressed. If the required directory tree does not exist, it will be created.
///
/// Example for bash shell:
///
/// --output='outputs/{name}/sorted.fasta.gz'
#[clap(short = 'o', long)]
#[clap(group = "outputs")]
pub output_path: Option<String>,

/// Path to output results TSV file
///
/// If the provided file path ends with one of the supported extensions: "gz", "bz2", "xz", "zst", then the file will be written compressed. Use "-" to write uncompressed to standard output (stdout). If the required directory tree does not exist, it will be created.
#[clap(short = 'r', long)]
#[clap(value_hint = ValueHint::FilePath)]
pub output_results_tsv: Option<String>,

#[clap(flatten, next_help_heading = "Algorithm")]
pub search_params: NextcladeSeqSortParams,

#[clap(flatten, next_help_heading = "Other")]
pub other_params: NextcladeRunOtherParams,

/// Use custom dataset server.
///
/// You can host your own dataset server, with one or more datasets, grouped into dataset collections, and use this server to provide datasets to users of Nextclade CLI and Nextclade Web. Refer to Nextclade dataset documentation for more details.
#[clap(long)]
#[clap(value_hint = ValueHint::Url)]
#[clap(default_value_t = Url::from_str(DATA_FULL_DOMAIN).expect("Invalid URL"))]
pub server: Url,

#[clap(flatten)]
pub proxy_config: ProxyConfig,
}

fn generate_completions(shell: &str) -> Result<(), Report> {
let mut command = NextcladeArgs::command();

Expand Down Expand Up @@ -907,5 +988,6 @@ pub fn nextclade_parse_cli_args() -> Result<(), Report> {
NextcladeDatasetCommands::List(dataset_list_args) => nextclade_dataset_list(dataset_list_args),
NextcladeDatasetCommands::Get(dataset_get_args) => nextclade_dataset_get(&dataset_get_args),
},
NextcladeCommands::Sort(seq_sort_args) => nextclade_seq_sort(&seq_sort_args),
}
}
Loading