Merge branch 'master' of https://github.com/molgenis/molgenis-py-cons…

…ensus into fix/history
molgenis · Sep 3, 2020 · c33f402 · c33f402
2 parents 109f85c + f1edf12
commit c33f402
Show file tree

Hide file tree

Showing 7 changed files with 84 additions and 22 deletions.
diff --git a/Jenkinsfile b/Jenkinsfile
@@ -35,6 +35,9 @@ pipeline {
                     sh "python setup.py test"
                     sh "pip install ."
                 }
+                container('sonar') {
+                    sh "sonar-scanner -Dsonar.github.oauth=${env.GITHUB_TOKEN} -Dsonar.pullrequest.base=${CHANGE_TARGET} -Dsonar.pullrequest.branch=${BRANCH_NAME} -Dsonar.pullrequest.key=${env.CHANGE_ID} -Dsonar.pullrequest.provider=GitHub -Dsonar.pullrequest.github.repository=molgenis/molgenis-py-consensus"
+                }
             }
         }
         stage('Build: [ master ]') {

diff --git a/README.md b/README.md
@@ -71,7 +71,8 @@ The header of the tab separated file contains the following values: `"timestamp"
 "classification", "last_updated_by", "last_updated_on"`. Except from `"timestamp"` and `"id"`, these are the columns as 
 delivered from Alissa Interpret. They are first imported into MOLGENIS using the "Amazon bucket file ingest"
 feature in the [MOLGENIS scheduled jobs plugin](https://molgenis.gitbooks.io/molgenis/content/guide-schedule.html).
-From there the files are downloaded as csv and put into the inbox folder of the pipeline. 
+From there the files are downloaded as csv and then converted into tab delimited (.txt) files and put into the inbox folder of the pipeline. With the script download_raw_lab_files.sh the Alissa data are automatically downloaded.
+Before starting the file ingest make sure that the vkgl_raw_"labname" are empty.
 
 ### Radboud/MUMC format
 The filename must contain the word "radboud". It is a tab separated file without a header, it should contain columns in
@@ -83,33 +84,36 @@ A tab separated file with the following columns: `"refseq_build", "chromosome",
 "geneid", "cDNA", "Protein"`.
 
 ### Run the pipeline  
-Remove the error files of the last export from the result folder. Run `MySpringBootApplication` in `IntelliJ` and place 
-the lab files one by one in the inbox (place the next if the previous one is reported to be done). After running the 
+Remove the error files of the last export from the result folder. Run `MySpringBootApplication` in `IntelliJ` or if you don't have `IntelliJ` installed, run `mvn clean spring-boot:run` (runs only with Java8) and place 
+the lab files one by one in the inbox (data-transform-vkgl/src/test/inbox) (place the next if the previous one is reported to be done). After running the 
 pipeline several files will be produced for each lab: 
 
-| File                      | Description                                                                       |
-|---------------------------|---------------------------------------------------------------------------------- |
-|`vkgl_*labname*.tsv`       | File with the data mapped to the generic VKGL data model                          |
-|`*labname*.txt`            | File with the raw data plus the columns generated by the pipeline                 |
-|`vkgl_*labname*_error.txt` | File with the errors that were filtered out because they are invalid or duplicate |
+| File                      | Description                                                                        |
+|---------------------------|------------------------------------------------------------------------------------|
+|`vkgl_*labname*.tsv`       | File with the data mapped to the generic VKGL data model (excl the errors)         |
+|`*labname*.txt`            | File with the raw data plus the columns generated by the pipeline (excl the errors)|
+|`vkgl_*labname*_error.txt` | File with the errors that were filtered out because they are invalid or duplicate  |
 
 Now it's time to cleanup the tables with raw data in your MOLGENIS instance:
 ```
-mcmd run vkgl_cleanup_raw_labs
+mcmd run vkgl_cleanup_labs_enriched_raw_data
 ```
 
 The raw files should be renamed to: `vkgl_raw_*labname*_v2.tsv` and placed in the `output` folder of this tool 
-(`molgenis-py-consensus`). Upload them:
+(`molgenis-py-consensus`).
+Upload them:
 ```
-mcmd run vkgl_upload_raw_labs
+mcmd run vkgl_upload_labs_enriched_raw_data
 ```
 
 The `vkgl_*labname*.tsv`
-should be moved to the `input` folder of this tool (`molgenis-py-consensus`). The error file can be send to the labs
-after the export is done.
+should be moved to the `input` folder of this tool (`molgenis-py-consensus`).
+The error file can be send to the labs after the export is done.
+
+By running the process_result_files.sh script, the renaming, moving to output and input folders as mentioned above is done automatically. This script also produces a file with counts per file.
 
 Now go to the `preprocessing` folder of this tool and run `PreProcessor.py`. Make sure your config file is correctly 
-set.
+set. This script creates the file `vkgl_comments.tsv` in the output folder of the pipeline. 
 
 ## 2. Add last export to history table
 At this point, please make sure you transported the lines of the previous consensus table to the
@@ -214,10 +218,28 @@ Remove message from homepage.
 Report to the labs that the export is finished, let them know which errors were found for their lab and which conflicts
 (`vkgl_opposites_report_*yymm of export*.txt`) were found in the consensus table.
 
+Send the raw Radboud/MUMC file and the raw files from the `Alissa` labs to LUMC to update LOVD and LOVD+.
+
 ## Checklist
 This export is a whole process. To make sure everything is done, use this checklist:
+- [ ] Delete data from vkgl_raw_'lab' tables in MOLGENIS:
+    - [ ] AMC
+    - [ ] Erasmus
+    - [ ] LUMC
+    - [ ] NKI
+    - [ ] Radboud/MUMC
+    - [ ] UMCG
+    - [ ] UMCU
+    - [ ] VUMC
 - [ ] Download Alissa files to MOLGENIS
-- [ ] Download raw tables from MOLGENIS
+    - [ ] AMC
+    - [ ] Erasmus
+    - [ ] NKI
+    - [ ] UMCG
+    - [ ] UMCU
+    - [ ] VUMC
+- [ ] Import LUMC and Radboud/MUMC data into vkgl_raw_'lab' tables in MOLGENIS (not obligated)   
+- [ ] Download raw tables from MOLGENIS (and store as tab-delimited files (.txt)
 - Process raw data for each lab:
     - [ ] AMC
     - [ ] Erasmus
@@ -273,4 +295,4 @@ python3 consensus
 
 
 ## Pipeline code diagram
-![alt text](diagrams/code.svg "Code diagram")
+![alt text](diagrams/code.svg "Code diagram")
diff --git a/download_raw_lab_files.sh b/download_raw_lab_files.sh
@@ -0,0 +1,29 @@
+# Script to download the Alissa data from Molgenis
+# Raw data of lumc and radboudmumc is in Molgenis, but different headers
+# use the files send by e-mail
+
+# File extension options for the commander are .xlsx and .zip
+# In the zip-file a .tsv is created => therefore download a zip-file,
+# unzip it and rename the .tsv to a .txt
+
+downloader=/Users/dieuwke.roelofs-prins/molgenis/tools/emx-downloader/downloader.jar
+output_folder=export_jun-2020/raw_lab_files
+labs="amc erasmus nki umcg umcu vumc"
+url=https://molgenis122.gcc.rug.nl/
+account=admin
+if [ "$1" = "" ]; then
+    echo "Start the script with the password for $url"
+    exit
+fi
+
+pwd=$1
+
+for lab in $labs;
+ do
+   output_file="$output_folder/vkgl_raw_$lab.zip"
+   entity="vkgl_raw_$lab"
+   java -jar $downloader -f $output_file -u $url -a $account -p $pwd -D -s 10000 $entity
+   unzip -d $output_folder $output_file
+   mv ${output_file/zip/tsv} ${output_file/zip/txt}
+   rm $output_file
+done
diff --git a/mcmd_scripts/vkgl_cleanup_raw_labs → ...ripts/vkgl_cleanup_labs_enriched_raw_data b/mcmd_scripts/vkgl_cleanup_raw_labs → ...ripts/vkgl_cleanup_labs_enriched_raw_data
@@ -1,8 +1,8 @@
 delete --data vkgl_raw_amc_v2 -f
-delete --data vkgl_raw_radboud_v2 -f
-delete --data vkgl_raw_lumc_v2 -f
 delete --data vkgl_raw_erasmus_v2 -f
-delete --data vkgl_raw_radboud_v2 -f
+delete --data vkgl_raw_lumc_v2 -f
 delete --data vkgl_raw_nki_v2 -f
+delete --data vkgl_raw_radboud_v2 -f
 delete --data vkgl_raw_umcg_v2 -f
-delete --data vkgl_raw_vumc_v2 -f
+delete --data vkgl_raw_umcu_v2 -f
+delete --data vkgl_raw_vumc_v2 -f
diff --git a/mcmd_scripts/vkgl_cleanup_labs_raw_data b/mcmd_scripts/vkgl_cleanup_labs_raw_data
@@ -0,0 +1,8 @@
+delete --data vkgl_raw_amc -f
+delete --data vkgl_raw_erasmus -f
+delete --data vkgl_raw_lumc -f
+delete --data vkgl_raw_nki -f
+delete --data vkgl_raw_radboud -f
+delete --data vkgl_raw_umcg -f
+delete --data vkgl_raw_umcu -f
+delete --data vkgl_raw_vumc -f
diff --git a/mcmd_scripts/vkgl_upload_raw_labs → ...cripts/vkgl_upload_labs_enriched_raw_data b/mcmd_scripts/vkgl_upload_raw_labs → ...cripts/vkgl_upload_labs_enriched_raw_data
@@ -1,8 +1,8 @@
 import vkgl_raw_amc_v2.tsv
 import vkgl_raw_erasmus_v2.tsv
-import vkgl_raw_radboud_v2.tsv
 import vkgl_raw_lumc_v2.tsv
 import vkgl_raw_nki_v2.tsv
+import vkgl_raw_radboud_v2.tsv
 import vkgl_raw_umcg_v2.tsv
 import vkgl_raw_umcu_v2.tsv
 import vkgl_raw_vumc_v2.tsv
diff --git a/sonar-project.properties b/sonar-project.properties
@@ -1,7 +1,7 @@
 sonar.organization=molgenis
 sonar.projectKey=org.molgenis:python-consensus
 sonar.login=${env.SONAR_TOKEN}
-sonar.sources=molgenis
+sonar.sources=consensus
 sonar.language=py
 sonar.sourceEncoding=UTF-8
 sonar.host.url=https://sonarcloud.io