Support analysis of Influenza B sequence data #23

dfornika · 2024-06-24T21:52:29Z

The segment lengths as defined here:

Lines 302 to 311 in a71e39a

    
           segment_lengths = { 
        
               'PB2': (2260, 2360), 
        
               'PB1': (2260, 2360), 
        
               'PA': (2120, 2250), 
        
               'HA': (1650, 1800), 
        
               'NP': (1480, 1580), 
        
               'NA': (1250, 1560), 
        
               'M': (975, 1030), 
        
               'NS': (815, 900), 
        
           }

...and here:

FluViewer/fluviewer/fluviewer.py

Lines 1589 to 1591 in a71e39a

    
           segment_lengths = {'PB2': (2260, 2360), 'PB1': (2260, 2360), 'PA': (2120, 2250), 
        
                              'HA': (1650, 1800), 'NP': (1480, 1580), 'NA': (1250, 1560), 
        
                              'M': (975, 1030), 'NS': (815, 900)}

...are appropriate for Flu A sequences, but not for Flu B.

Adjust the segment lengths to be compatible with both FluA and FluB sequences.

dfornika · 2024-07-05T16:27:47Z

This could be handled in a few ways:

The user specifies which "mode" they want to run the analysis in up front using a command-line argument: "Flu A mode" or "Flu B mode".
Relax the constraints that are in place such that they accommodate both Flu A and Flu B sequences (but are still effective in catching inappropriate segment lengths that we shouldn't see for either Flu A or Flu B
Dynamically detect which type of Flu sample is being analyzed, and use the appropriate segment length ranges (and possibly other QC/analysis criteria).

We could also possibly combine aspects of multiple of these approaches. For example, we could take the dynamic approach by default, but allow the user to bypass that and "force" either Flu A or Flu B mode up-front. I think that may be the best approach.

stefkary · 2024-07-05T18:16:56Z

Possibility to provide a "sample sheet" that specifies which samples & controls should be handled as FluA and FluB?

dfornika · 2024-07-05T18:39:37Z

Thanks for that suggestion @stefkary. I think the place we'd like to handle that would be our nextflow wrapper for FluViewer, which is here: https://github.com/BCCDC-PHL/fluviewer-nf

FluViewer itself is focused on analysis of a single sample at a time. In the nextflow wrapper would handle running analysis on multiple samples. I've added an issue there: BCCDC-PHL/fluviewer-nf#14

dfornika · 2024-07-06T14:57:05Z

Ok @stefkary we've added support for samplesheet input on our BCCDC-PHL/fluviewer-nf pipeline. We're currently only collecting info on the sample ID and the R1 and R2 illumina fastq files through the samplesheet. Once we've incorporated the ability to specify a "Flu A mode" and "Flu B mode" via command-line arguments here we'll consider how we can incorporate that into the samplesheet.

We'll also plan to support long (nanopore) read input via the samplesheet once that has been added here.

dfornika changed the title ~~Adjust segment length checks to be compatible with Influenza B sequences~~ Support analysis of Influenza B sequence data Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support analysis of Influenza B sequence data #23

Support analysis of Influenza B sequence data #23

dfornika commented Jun 24, 2024

dfornika commented Jul 5, 2024

stefkary commented Jul 5, 2024

dfornika commented Jul 5, 2024

dfornika commented Jul 6, 2024

Support analysis of Influenza B sequence data #23

Support analysis of Influenza B sequence data #23

Comments

dfornika commented Jun 24, 2024

dfornika commented Jul 5, 2024

stefkary commented Jul 5, 2024

dfornika commented Jul 5, 2024

dfornika commented Jul 6, 2024