-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: bump cdot to 0.2.27, snakemake wrappers #58
Conversation
WalkthroughThe pull request includes updates to the Changes
Possibly related PRs
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
workflow/rules/download.smk (1)
Line range hint
39-48
: Consider adding retry mechanism for FTP downloadsThe curl command could benefit from retry logic for better resilience against network issues.
Consider updating the shell command:
shell: """( - curl --silent "https://ftp.ncbi.nih.gov/refseq/{params.species}/mRNA_Prot/{params.species_name}.files.installed" > {output} + curl --silent --retry 3 --retry-delay 15 "https://ftp.ncbi.nih.gov/refseq/{params.species}/mRNA_Prot/{params.species_name}.files.installed" > {output} ) >{log} 2>&1"""
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
README.md
(1 hunks)config/config.yaml
(4 hunks)workflow/rules/download.smk
(1 hunks)workflow/rules/validate.smk
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- README.md
- workflow/rules/validate.smk
🔇 Additional comments (6)
workflow/rules/download.smk (2)
Line range hint 51-67
: LGTM! Well-structured rule with proper dependency handling
The rule is well-designed with:
- Clear input/output definitions
- Proper conda environment specification
- Appropriate logging configuration
- Smart use of external script for complex download logic
13-13
: Verify compatibility with major wrapper version upgrade
The change from v3.11.0 to v5.0.2 is a major version bump that might include breaking changes.
Let's verify the wrapper's compatibility:
config/config.yaml (4)
30-30
: LGTM: Version bump aligns with PR objective
The cdot release version update to 0.2.27 is consistent with the PR title.
113-113
: LGTM: Consistent version bump across genome builds
The cdot release version update to 0.2.27 maintains version parity with GRCh38-refseq.
216-217
: LGTM: Consistent changes with GRCh38-ensembl
The changes mirror those made for GRCh38-ensembl, maintaining consistency across genome builds:
- cdot release version updated to 0.2.27
- File format changed from GFF3 to GTF
154-155
: Verify impact of GFF3 to GTF format change
While the version bump is consistent, the change from GFF3 to GTF format might affect downstream processing. Please ensure that:
- The workflow tools support GTF format
- The format change is intentional and aligned with the snakemake wrapper updates mentioned in the PR title
✅ Verification successful
Based on the search results, I can now generate a final response. The codebase analysis shows that the workflow is designed to handle both GTF and GFF formats through the CDOT tool, which is used for transcript annotation processing. The format change appears to be intentional and aligned with the version updates.
Format change from GFF3 to GTF is compatible with the workflow
The change from GFF3 to GTF format is safe because:
- The workflow uses CDOT (v0.2.27) which supports both GFF3 and GTF formats as evidenced by multiple references in the config file
- Both formats are used across different genome builds in the configuration (e.g., RefSeq uses GFF while Ensembl uses GTF)
- The format change aligns with the source-specific conventions where Ensembl typically provides GTF files
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for GFF3/GTF file handling in the workflow
echo "Checking for GFF3/GTF file handling in workflow files:"
rg -i "g[tf]f[3]?" workflow/ -A 2
Length of output: 154
Script:
#!/bin/bash
# Search for GFF3/GTF file handling in the entire codebase
echo "Checking for GFF3/GTF file handling across the codebase:"
rg -i "g[tf]f[3]?" -A 2
# Search for any annotation file handling in workflow rules
echo -e "\nChecking for annotation file handling in workflow rules:"
ast-grep --pattern 'rule $_ {
$$$
input:
$$$
annotation = $_
$$$
}'
# Check config file for any other GTF/GFF3 references
echo -e "\nChecking config file for GTF/GFF3 patterns:"
cat config/config.yaml
Length of output: 9058
Script:
#!/bin/bash
# Search for cdot configuration or usage in workflow files
echo "Checking for cdot configuration and usage:"
rg -i "cdot" workflow/ -A 3
# Search for any rules that process annotation files
echo -e "\nChecking for rules processing annotation files:"
fd -e smk -e snakefile . -x cat {} \; | rg -A 5 "rule.*annotation"
# Check if there are any documentation files about annotation formats
echo -e "\nChecking for documentation about annotation formats:"
fd -e md . -x rg -i "g[tf]f|annotation" {} \;
Length of output: 27303
Release-As: 0.9.0
Summary by CodeRabbit
New Features
mehari-data
version tov0.9.0
.Bug Fixes
Documentation
Chores