Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with custom dataset #58

Merged
merged 50 commits into from
Dec 4, 2023
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
bb2dcb4
make extractTraits() workflow execute only if cohort is UKBB
roskamsh Jan 19, 2023
c8e1abe
update configuration to be consistent with the new nextflow parameters
roskamsh Jan 19, 2023
052acb4
update from_actors config file with new general parameters
roskamsh Jan 19, 2023
a71a758
add trace file
roskamsh Jan 19, 2023
315302e
Resolved merge conflict by incorporating both suggestions.
roskamsh Jan 19, 2023
f5923d8
Merge branch 'main' into brh_data_source
roskamsh Feb 6, 2023
3f2fcb2
make QC file optional for filterBED process
roskamsh Feb 13, 2023
fc83c9d
update .gitignore
roskamsh Feb 20, 2023
e16a4a1
Merge branch 'main' into brh_data_source
roskamsh Sep 14, 2023
4d6eaa7
update path to test bed and bgen files
roskamsh Sep 14, 2023
33fd9d5
update path to bed and bgen files for test
roskamsh Sep 14, 2023
b8908f6
add FlashPCA exclusion regions lifted over to hg38
roskamsh Oct 18, 2023
20d8bc9
clean up logic for filterBED script
roskamsh Oct 18, 2023
3923dc4
udpated to correct flag name qcfile
roskamsh Oct 18, 2023
dcef8e3
update container to brh_data_source
roskamsh Oct 18, 2023
acb185f
add runPCA subworkflow for running PCA outside of main workflow
roskamsh Oct 18, 2023
3a778c4
add default LD_BLOCKS value
roskamsh Oct 19, 2023
854f735
reassign FlashPCA output to original name
roskamsh Oct 19, 2023
f1f8e4e
make LD blocks file optional
roskamsh Oct 19, 2023
9a00c33
Merge branch 'main' into brh_data_source
roskamsh Nov 27, 2023
569aaf1
add data/ folder with PCA exclusion regions for hg19
roskamsh Nov 27, 2023
b84d21e
add PCA exclusion regions for hg38
roskamsh Nov 27, 2023
4c7dae7
move out of test/ directory
roskamsh Nov 27, 2023
c1b218f
update
roskamsh Nov 27, 2023
d58c399
revert to tl-core package, as this is now updated
roskamsh Nov 28, 2023
224365c
revert to tl-core package, as this is now updated to new version
roskamsh Nov 28, 2023
0fe4880
add new test function
roskamsh Nov 28, 2023
9a91ff7
add new test non_ukbb
roskamsh Nov 28, 2023
78f0d4c
Add new test that has COHORT != UKBB and runs without LD block or QC …
roskamsh Nov 28, 2023
aceda2f
Add input file for new test
roskamsh Nov 28, 2023
c3a1eb2
Add julia test file for new CI test, non-UKBB setting
roskamsh Nov 28, 2023
a24e48a
add new test to CI
roskamsh Nov 28, 2023
6866347
update negative-controls docker image to v0.2
roskamsh Nov 29, 2023
144a28a
Add documentation for running TarGene with a custom dataset
roskamsh Nov 29, 2023
0ddd647
dd COHORT to nextflow parameters, and update LD_BLOCKS and QCC_FILE d…
roskamsh Nov 29, 2023
6a3d962
rename test files and make FLASHPCA_EXCLUSION_REGIONS default to h19
olivierlabayle Nov 30, 2023
b430630
change NOT_UKB to CUSTOM
olivierlabayle Nov 30, 2023
3f4aee9
update doc
olivierlabayle Nov 30, 2023
68b5bcc
more doc update
olivierlabayle Nov 30, 2023
86bb82a
move extractTraits cohort logic to extractTraits sub-workflow
roskamsh Dec 4, 2023
7dc158f
Merge branch 'brh_data_source' of github.com:TARGENE/targene-pipeline…
roskamsh Dec 4, 2023
a75f411
revert to previous decrypted_dataset channel declaration in non-UKBB …
roskamsh Dec 4, 2023
2a37b83
ensure extracted_traits is a value channel
roskamsh Dec 4, 2023
dee87a7
remove overly stringent test
roskamsh Dec 4, 2023
1b96a06
remove overly stringent test, check_n_failed_traits
roskamsh Dec 4, 2023
d41f45e
add new test for checking number of runs for generateIIDGenotypes pro…
roskamsh Dec 4, 2023
bd60111
add input csv files with multiple TFs case
roskamsh Dec 4, 2023
efa437a
update configuration to run from_actors mode
roskamsh Dec 4, 2023
8016bd0
update test to cover from_actors mode
roskamsh Dec 4, 2023
2f22712
rename files
olivierlabayle Dec 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,5 @@ test/grmmatrix
LocalPreferences.toml
test/sandbox.jl
docs/Manifest.toml
*.DS_Store
*.DS_Store
trace.txt*
7 changes: 4 additions & 3 deletions conf/ci_jobs/from_actors.config
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
params {
PARAMETER_PLAN = "FROM_ACTORS"
DECRYPTED_DATASET = "test/data/dataset.csv"
UKBB_BED_FILES = "test/data/bed/ukb_chr{1,2,3}.{bed,bim,fam}"
UKBB_BGEN_FILES = "test/data/bgen/ukb_chr{1,2,3}.{bgen,bgen.bgi,sample}"
COHORT = "UKBB"
BED_FILES = "test/data/bed/ukb_chr{1,2,3}.{bed,bim,fam}"
BGEN_FILES = "test/data/bgen/ukb_chr{1,2,3}.{bgen,bgen.bgi,sample}"
QC_FILE = "test/data/qc_file.csv"
LD_BLOCKS = "test/data/ld_blocks.txt"
FLASHPCA_EXCLUSION_REGIONS = "test/data/exclusion_regions_hg19.txt"
Expand All @@ -25,4 +26,4 @@ process {
memory = '10G'
clusterOptions = { task.memory ? "-l mem_free=20G,h_vmem=${task.memory.bytes/task.cpus}" : null }
}
}
}
7 changes: 4 additions & 3 deletions conf/ci_jobs/from_param_files.config
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@ params {
PARAMETER_PLAN = "FROM_PARAM_FILES"
PARAMETER_FILES = "test/data/parameters/parameter_*"
DECRYPTED_DATASET = "test/data/dataset.csv"
UKBB_BED_FILES = "test/data/bed/ukb_chr{1,2,3}.{bed,bim,fam}"
UKBB_BGEN_FILES = "test/data/bgen/ukb_chr{1,2,3}.{bgen,bgen.bgi,sample}"
COHORT = "UKBB"
BED_FILES = "test/data/bed/ukb_chr{1,2,3}.{bed,bim,fam}"
BGEN_FILES = "test/data/bgen/ukb_chr{1,2,3}.{bgen,bgen.bgi,sample}"
QC_FILE = "test/data/qc_file.csv"
LD_BLOCKS = "test/data/ld_blocks.txt"
FLASHPCA_EXCLUSION_REGIONS = "test/data/exclusion_regions_hg19.txt"
Expand All @@ -12,4 +13,4 @@ params {
TRAITS_CONFIG = "test/data/ukbconfig_small.yaml"
ESTIMATORFILE = "test/data/estimator.yaml"
NB_VAR_ESTIMATORS = 0
}
}
21 changes: 14 additions & 7 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@ params.PVAL_SIEVE = 0.05

params.OUTDIR = "$launchDir/results"

params.COHORT = "UKBB"
params.TRAITS_CONFIG = "NO_UKB_TRAIT_CONFIG"
params.WITHDRAWAL_LIST = 'NO_WITHDRAWAL_LIST'
params.QC_FILE = "NO_QC_FILE"

params.PHENOTYPES_BATCH_SIZE = 0
params.EXTRA_CONFOUNDERS = 'NO_EXTRA_CONFOUNDER'
Expand Down Expand Up @@ -61,7 +63,7 @@ workflow generateIIDGenotypes {
qc_file = Channel.value(file("$params.QC_FILE"))
flashpca_excl_reg = Channel.value(file("$params.FLASHPCA_EXCLUSION_REGIONS"))
ld_blocks = Channel.value(file("$params.LD_BLOCKS"))
bed_files_ch = Channel.fromFilePairs("$params.UKBB_BED_FILES", size: 3, checkIfExists: true){ file -> file.baseName }
bed_files_ch = Channel.fromFilePairs("$params.BED_FILES", size: 3, checkIfExists: true){ file -> file.baseName }

IIDGenotypes(flashpca_excl_reg, ld_blocks, bed_files_ch, qc_file, traits)

Expand Down Expand Up @@ -89,7 +91,7 @@ workflow generateTMLEEstimates {

main:
estimator_file = Channel.value(file("$params.ESTIMATORFILE", checkIfExists: true))
bgen_files = Channel.fromPath("$params.UKBB_BGEN_FILES", checkIfExists: true).collect()
bgen_files = Channel.fromPath("$params.BGEN_FILES", checkIfExists: true).collect()

if (params.PARAMETER_PLAN == "FROM_ACTORS") {
bqtls = Channel.value(file("$params.BQTLS"))
Expand Down Expand Up @@ -149,18 +151,23 @@ workflow generateSieveEstimates {
}

workflow {
// Extract traits
extractTraits()
// Extract traits for UKBB
if (params.COHORT == "UKBB") {
extractTraits()
phenoInput = extractTraits.out
} else {
phenoInput = Channel.fromPath("$params.DECRYPTED_DATASET", checkIfExists: true)
}

// Generate IID Genotypes
generateIIDGenotypes(extractTraits.out)
generateIIDGenotypes(phenoInput)

// Genetic confounders
geneticConfounders(generateIIDGenotypes.out)

// generate estimates
generateTMLEEstimates(
extractTraits.out,
phenoInput,
geneticConfounders.out,
)

Expand All @@ -174,4 +181,4 @@ workflow {
}

MergeOutputs(generateTMLEEstimates.out.tmle_csvs.collect(), sieve_csv)
}
}
27 changes: 19 additions & 8 deletions modules/genotypes.nf
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
process filterBED{
label 'bigmem'
container "olivierlabayle/tl-core:v0.3.0"
container "olivierlabayle/tl-core:brh_data_source"
publishDir "$params.OUTDIR/qc_filtered_chromosomes", mode: 'symlink'

input:
Expand All @@ -14,12 +14,23 @@ process filterBED{

script:
prefix = bedfiles[0].toString().minus('.bed')
"""
TEMPD=\$(mktemp -d)
JULIA_DEPOT_PATH=\$TEMPD:/opt julia --project=/TargeneCore.jl --startup-file=no /TargeneCore.jl/bin/prepare_confounders.jl \
--input $prefix --output filtered.$prefix --qcfile $qcfile --maf-threshold $params.MAF_THRESHOLD --ld-blocks $ld_blocks --traits $traits filter
"""


olivierlabayle marked this conversation as resolved.
Show resolved Hide resolved
script="TEMPD=\$(mktemp -d)\n"
if (params.QC_FILE == "NO_QC_FILE") {
script += """
JULIA_DEPOT_PATH=\$TEMPD:/opt julia --project=/TargeneCore.jl --startup-file=no /TargeneCore.jl/bin/prepare_confounders.jl \
--input $prefix --output filtered.$prefix --maf-threshold $params.MAF_THRESHOLD --ld-blocks $ld_blocks --traits $traits filter
"""
.stripIndent()
} else {
script += """
JULIA_DEPOT_PATH=\$TEMPD:/opt julia --project=/TargeneCore.jl --startup-file=no /TargeneCore.jl/bin/prepare_confounders.jl \
--input $prefix --output filtered.$prefix --qcfile $qcfile --maf-threshold $params.MAF_THRESHOLD --ld-blocks $ld_blocks --traits $traits filter
"""
.stripIndent()
}

script
}


Expand Down Expand Up @@ -95,4 +106,4 @@ workflow IIDGenotypes{

emit:
SampleQCFilter.out
}
}