Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
…ensus into fix/history
  • Loading branch information
marikaris committed Sep 3, 2020
2 parents 109f85c + f1edf12 commit c33f402
Show file tree
Hide file tree
Showing 7 changed files with 84 additions and 22 deletions.
3 changes: 3 additions & 0 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ pipeline {
sh "python setup.py test"
sh "pip install ."
}
container('sonar') {
sh "sonar-scanner -Dsonar.github.oauth=${env.GITHUB_TOKEN} -Dsonar.pullrequest.base=${CHANGE_TARGET} -Dsonar.pullrequest.branch=${BRANCH_NAME} -Dsonar.pullrequest.key=${env.CHANGE_ID} -Dsonar.pullrequest.provider=GitHub -Dsonar.pullrequest.github.repository=molgenis/molgenis-py-consensus"
}
}
}
stage('Build: [ master ]') {
Expand Down
54 changes: 38 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,8 @@ The header of the tab separated file contains the following values: `"timestamp"
"classification", "last_updated_by", "last_updated_on"`. Except from `"timestamp"` and `"id"`, these are the columns as
delivered from Alissa Interpret. They are first imported into MOLGENIS using the "Amazon bucket file ingest"
feature in the [MOLGENIS scheduled jobs plugin](https://molgenis.gitbooks.io/molgenis/content/guide-schedule.html).
From there the files are downloaded as csv and put into the inbox folder of the pipeline.
From there the files are downloaded as csv and then converted into tab delimited (.txt) files and put into the inbox folder of the pipeline. With the script download_raw_lab_files.sh the Alissa data are automatically downloaded.
Before starting the file ingest make sure that the vkgl_raw_"labname" are empty.

### Radboud/MUMC format
The filename must contain the word "radboud". It is a tab separated file without a header, it should contain columns in
Expand All @@ -83,33 +84,36 @@ A tab separated file with the following columns: `"refseq_build", "chromosome",
"geneid", "cDNA", "Protein"`.

### Run the pipeline
Remove the error files of the last export from the result folder. Run `MySpringBootApplication` in `IntelliJ` and place
the lab files one by one in the inbox (place the next if the previous one is reported to be done). After running the
Remove the error files of the last export from the result folder. Run `MySpringBootApplication` in `IntelliJ` or if you don't have `IntelliJ` installed, run `mvn clean spring-boot:run` (runs only with Java8) and place
the lab files one by one in the inbox (data-transform-vkgl/src/test/inbox) (place the next if the previous one is reported to be done). After running the
pipeline several files will be produced for each lab:

| File | Description |
|---------------------------|---------------------------------------------------------------------------------- |
|`vkgl_*labname*.tsv` | File with the data mapped to the generic VKGL data model |
|`*labname*.txt` | File with the raw data plus the columns generated by the pipeline |
|`vkgl_*labname*_error.txt` | File with the errors that were filtered out because they are invalid or duplicate |
| File | Description |
|---------------------------|------------------------------------------------------------------------------------|
|`vkgl_*labname*.tsv` | File with the data mapped to the generic VKGL data model (excl the errors) |
|`*labname*.txt` | File with the raw data plus the columns generated by the pipeline (excl the errors)|
|`vkgl_*labname*_error.txt` | File with the errors that were filtered out because they are invalid or duplicate |

Now it's time to cleanup the tables with raw data in your MOLGENIS instance:
```
mcmd run vkgl_cleanup_raw_labs
mcmd run vkgl_cleanup_labs_enriched_raw_data
```

The raw files should be renamed to: `vkgl_raw_*labname*_v2.tsv` and placed in the `output` folder of this tool
(`molgenis-py-consensus`). Upload them:
(`molgenis-py-consensus`).
Upload them:
```
mcmd run vkgl_upload_raw_labs
mcmd run vkgl_upload_labs_enriched_raw_data
```

The `vkgl_*labname*.tsv`
should be moved to the `input` folder of this tool (`molgenis-py-consensus`). The error file can be send to the labs
after the export is done.
should be moved to the `input` folder of this tool (`molgenis-py-consensus`).
The error file can be send to the labs after the export is done.

By running the process_result_files.sh script, the renaming, moving to output and input folders as mentioned above is done automatically. This script also produces a file with counts per file.

Now go to the `preprocessing` folder of this tool and run `PreProcessor.py`. Make sure your config file is correctly
set.
set. This script creates the file `vkgl_comments.tsv` in the output folder of the pipeline.

## 2. Add last export to history table
At this point, please make sure you transported the lines of the previous consensus table to the
Expand Down Expand Up @@ -214,10 +218,28 @@ Remove message from homepage.
Report to the labs that the export is finished, let them know which errors were found for their lab and which conflicts
(`vkgl_opposites_report_*yymm of export*.txt`) were found in the consensus table.

Send the raw Radboud/MUMC file and the raw files from the `Alissa` labs to LUMC to update LOVD and LOVD+.

## Checklist
This export is a whole process. To make sure everything is done, use this checklist:
- [ ] Delete data from vkgl_raw_'lab' tables in MOLGENIS:
- [ ] AMC
- [ ] Erasmus
- [ ] LUMC
- [ ] NKI
- [ ] Radboud/MUMC
- [ ] UMCG
- [ ] UMCU
- [ ] VUMC
- [ ] Download Alissa files to MOLGENIS
- [ ] Download raw tables from MOLGENIS
- [ ] AMC
- [ ] Erasmus
- [ ] NKI
- [ ] UMCG
- [ ] UMCU
- [ ] VUMC
- [ ] Import LUMC and Radboud/MUMC data into vkgl_raw_'lab' tables in MOLGENIS (not obligated)
- [ ] Download raw tables from MOLGENIS (and store as tab-delimited files (.txt)
- Process raw data for each lab:
- [ ] AMC
- [ ] Erasmus
Expand Down Expand Up @@ -273,4 +295,4 @@ python3 consensus


## Pipeline code diagram
![alt text](diagrams/code.svg "Code diagram")
![alt text](diagrams/code.svg "Code diagram")
29 changes: 29 additions & 0 deletions download_raw_lab_files.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Script to download the Alissa data from Molgenis
# Raw data of lumc and radboudmumc is in Molgenis, but different headers
# use the files send by e-mail

# File extension options for the commander are .xlsx and .zip
# In the zip-file a .tsv is created => therefore download a zip-file,
# unzip it and rename the .tsv to a .txt

downloader=/Users/dieuwke.roelofs-prins/molgenis/tools/emx-downloader/downloader.jar
output_folder=export_jun-2020/raw_lab_files
labs="amc erasmus nki umcg umcu vumc"
url=https://molgenis122.gcc.rug.nl/
account=admin
if [ "$1" = "" ]; then
echo "Start the script with the password for $url"
exit
fi

pwd=$1

for lab in $labs;
do
output_file="$output_folder/vkgl_raw_$lab.zip"
entity="vkgl_raw_$lab"
java -jar $downloader -f $output_file -u $url -a $account -p $pwd -D -s 10000 $entity
unzip -d $output_folder $output_file
mv ${output_file/zip/tsv} ${output_file/zip/txt}
rm $output_file
done
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
delete --data vkgl_raw_amc_v2 -f
delete --data vkgl_raw_radboud_v2 -f
delete --data vkgl_raw_lumc_v2 -f
delete --data vkgl_raw_erasmus_v2 -f
delete --data vkgl_raw_radboud_v2 -f
delete --data vkgl_raw_lumc_v2 -f
delete --data vkgl_raw_nki_v2 -f
delete --data vkgl_raw_radboud_v2 -f
delete --data vkgl_raw_umcg_v2 -f
delete --data vkgl_raw_vumc_v2 -f
delete --data vkgl_raw_umcu_v2 -f
delete --data vkgl_raw_vumc_v2 -f
8 changes: 8 additions & 0 deletions mcmd_scripts/vkgl_cleanup_labs_raw_data
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
delete --data vkgl_raw_amc -f
delete --data vkgl_raw_erasmus -f
delete --data vkgl_raw_lumc -f
delete --data vkgl_raw_nki -f
delete --data vkgl_raw_radboud -f
delete --data vkgl_raw_umcg -f
delete --data vkgl_raw_umcu -f
delete --data vkgl_raw_vumc -f
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import vkgl_raw_amc_v2.tsv
import vkgl_raw_erasmus_v2.tsv
import vkgl_raw_radboud_v2.tsv
import vkgl_raw_lumc_v2.tsv
import vkgl_raw_nki_v2.tsv
import vkgl_raw_radboud_v2.tsv
import vkgl_raw_umcg_v2.tsv
import vkgl_raw_umcu_v2.tsv
import vkgl_raw_vumc_v2.tsv
2 changes: 1 addition & 1 deletion sonar-project.properties
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
sonar.organization=molgenis
sonar.projectKey=org.molgenis:python-consensus
sonar.login=${env.SONAR_TOKEN}
sonar.sources=molgenis
sonar.sources=consensus
sonar.language=py
sonar.sourceEncoding=UTF-8
sonar.host.url=https://sonarcloud.io
Expand Down

0 comments on commit c33f402

Please sign in to comment.