Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support analysis of Influenza B sequence data #23

Open
dfornika opened this issue Jun 24, 2024 · 4 comments
Open

Support analysis of Influenza B sequence data #23

dfornika opened this issue Jun 24, 2024 · 4 comments

Comments

@dfornika
Copy link
Member

The segment lengths as defined here:

segment_lengths = {
'PB2': (2260, 2360),
'PB1': (2260, 2360),
'PA': (2120, 2250),
'HA': (1650, 1800),
'NP': (1480, 1580),
'NA': (1250, 1560),
'M': (975, 1030),
'NS': (815, 900),
}

...and here:

segment_lengths = {'PB2': (2260, 2360), 'PB1': (2260, 2360), 'PA': (2120, 2250),
'HA': (1650, 1800), 'NP': (1480, 1580), 'NA': (1250, 1560),
'M': (975, 1030), 'NS': (815, 900)}

...are appropriate for Flu A sequences, but not for Flu B.

Adjust the segment lengths to be compatible with both FluA and FluB sequences.

@dfornika
Copy link
Member Author

dfornika commented Jul 5, 2024

This could be handled in a few ways:

  • The user specifies which "mode" they want to run the analysis in up front using a command-line argument: "Flu A mode" or "Flu B mode".
  • Relax the constraints that are in place such that they accommodate both Flu A and Flu B sequences (but are still effective in catching inappropriate segment lengths that we shouldn't see for either Flu A or Flu B
  • Dynamically detect which type of Flu sample is being analyzed, and use the appropriate segment length ranges (and possibly other QC/analysis criteria).

We could also possibly combine aspects of multiple of these approaches. For example, we could take the dynamic approach by default, but allow the user to bypass that and "force" either Flu A or Flu B mode up-front. I think that may be the best approach.

@dfornika dfornika changed the title Adjust segment length checks to be compatible with Influenza B sequences Support analysis of Influenza B sequence data Jul 5, 2024
@stefkary
Copy link

stefkary commented Jul 5, 2024

Possibility to provide a "sample sheet" that specifies which samples & controls should be handled as FluA and FluB?

@dfornika
Copy link
Member Author

dfornika commented Jul 5, 2024

Thanks for that suggestion @stefkary. I think the place we'd like to handle that would be our nextflow wrapper for FluViewer, which is here: https://github.com/BCCDC-PHL/fluviewer-nf

FluViewer itself is focused on analysis of a single sample at a time. In the nextflow wrapper would handle running analysis on multiple samples. I've added an issue there: BCCDC-PHL/fluviewer-nf#14

@dfornika
Copy link
Member Author

dfornika commented Jul 6, 2024

Ok @stefkary we've added support for samplesheet input on our BCCDC-PHL/fluviewer-nf pipeline. We're currently only collecting info on the sample ID and the R1 and R2 illumina fastq files through the samplesheet. Once we've incorporated the ability to specify a "Flu A mode" and "Flu B mode" via command-line arguments here we'll consider how we can incorporate that into the samplesheet.

We'll also plan to support long (nanopore) read input via the samplesheet once that has been added here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants