Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix remaining relative paths on load #4388

Closed
wants to merge 72 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
2b00b74
Fix #2693. Add subject_id and phenotype_groups to case parsing.
dnil Jan 29, 2024
5386c3f
Fix #1394 - convert remaining relative file paths to abs on load
dnil Jan 31, 2024
b13ebed
Fix a few imports
dnil Jan 31, 2024
72bdeee
actual items would be good
dnil Feb 1, 2024
6035e20
Avoid another missing key
dnil Feb 1, 2024
5594df9
Naming..
dnil Feb 1, 2024
039fe87
unused
dnil Feb 1, 2024
53af42f
vcf_files is optional
dnil Feb 1, 2024
3c13a9c
again, vcf_files is optional
dnil Feb 1, 2024
0b934c4
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 1, 2024
5cb0792
actual items...
dnil Feb 1, 2024
3fde26a
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 1, 2024
f21974c
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 2, 2024
e3d859b
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 2, 2024
522e1e3
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 5, 2024
a52f4c1
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 7, 2024
b176bf4
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 7, 2024
b98e548
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 8, 2024
1e129fe
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 8, 2024
440af10
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 8, 2024
7f0c222
Merge branch 'main' into fix_rel_paths
dnil Feb 9, 2024
97ec74b
Merge branch 'main' into fix_rel_paths
dnil Feb 9, 2024
3720672
Merge branch 'main' into fix_rel_paths
dnil Feb 12, 2024
83ee0f9
Merge branch 'main' into fix_rel_paths
TereseBo Feb 15, 2024
6dee4d3
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 16, 2024
bc114d5
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 19, 2024
e785414
Merge branch 'main' into fix_rel_paths
dnil Feb 19, 2024
2d62a07
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 19, 2024
71f4e8e
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 19, 2024
6a294d6
Merge branch 'main' into fix_rel_paths
dnil Feb 19, 2024
65af3e1
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 19, 2024
d868d57
Merge branch 'main' into fix_rel_paths
dnil Feb 20, 2024
42d095b
Merge branch 'main' into fix_rel_paths
dnil Feb 20, 2024
183872d
Merge branch 'main' into fix_rel_paths
dnil Feb 20, 2024
021b980
Merge branch 'main' into fix_rel_paths
dnil Feb 20, 2024
971f500
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 20, 2024
158be7a
Merge branch 'main' into fix_rel_paths
TereseBo Feb 20, 2024
ca434da
Fixed changelog after latest merge with main
TereseBo Feb 20, 2024
0a5a370
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 20, 2024
9aee502
Merge branch 'main' into fix_rel_paths
dnil Feb 21, 2024
f84a6b9
Merge branch 'main' into fix_rel_paths
dnil Feb 21, 2024
13573eb
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 27, 2024
4d57c86
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 27, 2024
1fe96de
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 28, 2024
4d45965
Merge branch 'main' into fix_rel_paths
northwestwitch Feb 28, 2024
2aeaa0f
Merge branch 'main' into fix_rel_paths
dnil Feb 29, 2024
3439da0
Merge branch 'main' into fix_rel_paths
TereseBo Mar 1, 2024
c696f8f
Merge branch 'main' into fix_rel_paths
TereseBo Mar 4, 2024
77ef797
Merge branch 'main' into fix_rel_paths
TereseBo Mar 4, 2024
b956a8c
Merge branch 'main' into fix_rel_paths
dnil Mar 4, 2024
d3cc02f
Merge branch 'main' into fix_rel_paths
northwestwitch Mar 5, 2024
a52345c
Merge branch 'main' into fix_rel_paths
northwestwitch Mar 5, 2024
e5dab6f
Merge branch 'main' into fix_rel_paths
dnil Mar 5, 2024
5923ee8
Merge branch 'main' into fix_rel_paths
northwestwitch Mar 5, 2024
339b42c
Merge branch 'main' into fix_rel_paths
northwestwitch Mar 7, 2024
d5a1752
Merge branch 'main' into fix_rel_paths
dnil Mar 7, 2024
727af72
Merge branch 'main' into fix_rel_paths
dnil Mar 8, 2024
6ca051c
Merge branch 'main' into fix_rel_paths
TereseBo Mar 11, 2024
116c6f9
Merge branch 'main' into fix_rel_paths
dnil Mar 12, 2024
8de6f60
Merge branch 'main' into fix_rel_paths
dnil Mar 12, 2024
cb390ab
Merge branch 'main' into fix_rel_paths
northwestwitch Mar 13, 2024
562dc55
Merge branch 'main' into fix_rel_paths
TereseBo Mar 13, 2024
e3a6d45
Merge branch 'main' into fix_rel_paths
TereseBo Mar 15, 2024
aa28c24
Merge branch 'main' into fix_rel_paths
northwestwitch Mar 15, 2024
4592186
Merge branch 'main' into fix_rel_paths
TereseBo Mar 18, 2024
0a5012d
Merge branch 'main' into fix_rel_paths
northwestwitch Mar 19, 2024
7bb427e
Merge branch 'main' into fix_rel_paths
dnil Mar 20, 2024
2c97cc1
Merge branch 'main' into fix_rel_paths
northwestwitch Mar 20, 2024
9b04de4
Merge branch 'main' into fix_rel_paths
northwestwitch Mar 21, 2024
0604cdf
Merge branch 'main' into fix_rel_paths
northwestwitch Mar 22, 2024
cbaf183
Merge branch 'main' into fix_rel_paths
northwestwitch Mar 25, 2024
3b88047
Merge branch 'main' into fix_rel_paths
TereseBo Mar 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ About changelog [here](https://keepachangelog.com/en/1.0.0/)
- Default loglevel up to INFO, making logs with default start easier to read
- Add XTR region to PAR region definition
- Diagnoses can be searched on diagnoses page without waiting for load first
- Explicitly store case file paths from load conf accessible on load as absolute paths for later access
### Fixed
- Removed log info showing hgnc IDs used in variantS search
- Maintain Matchmaker Exchange and Beacon submission status when a case is re-uploaded
Expand Down
4 changes: 3 additions & 1 deletion scout/adapter/mongo/case.py
Original file line number Diff line number Diff line change
Expand Up @@ -908,7 +908,9 @@ def load_case(self, config_data: dict, update: bool = False, keep_actions: bool
try:
for vcf_file in files:
# Check if file exists
if not case_obj["vcf_files"].get(vcf_file["file_name"]):
if not case_obj.get("vcf_files") or not case_obj["vcf_files"].get(
vcf_file["file_name"]
):
LOG.debug("didn't find {}, skipping".format(vcf_file["file_name"]))
continue

Expand Down
76 changes: 64 additions & 12 deletions scout/build/case.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import logging
import os
from datetime import datetime
from typing import Dict
from typing import Dict, List, Optional, Union

from scout.constants import CUSTOM_CASE_REPORTS, PHENOTYPE_GROUPS
from scout.exceptions import ConfigError, IntegrityError
Expand Down Expand Up @@ -42,6 +43,59 @@ def _populate_pipeline_info(case_obj, case_data):
case_obj["pipeline_version"] = case_data["exe_ver"]


def set_abspath_case_file(case_obj: dict, case_data: dict, case_file: str):
"""Abs path case file. E.g. demo files appear as relative path files, and storing absolute paths
ensures we do not have to load cases from the same working directory as we start the server process.
"""
file_path = case_data.get(case_file)
if file_path and os.path.exists(file_path):
case_obj[case_file] = os.path.abspath(file_path)
else:
case_obj[case_file] = None


def set_abspath_case_nested_files(case_obj: dict, case_data: dict, nested_file_key: str):
"""Absolute path nested case files. Similar to the single abs path setter, but some paths reside nested
directly under a particular key.
"""
case_obj[nested_file_key] = case_data.get(nested_file_key)
if case_obj.get(nested_file_key):
for file_type, nested_file_item in case_obj[nested_file_key].items():
if nested_file_item and os.path.exists(nested_file_item):
case_obj[nested_file_key][file_type] = os.path.abspath(nested_file_item)
else:
case_obj[nested_file_key][file_type] = None


def set_abspath_case_nested_image_files(
case_obj: dict,
case_data: dict,
nested_file_key: Optional[str] = "custom_images",
path_key: Optional[str] = "path",
):
"""Absolute path for complexly nested custom image paths. Similar to the single abs path setter, but the custom image paths reside nested
in arrays, with the sub-key name "path" one or two levels into the dictionary.

E.g. case["custom_images"]["str_variants_images"][2]["path"] or
case["custom_images"]["case_images"]["section_one"][1]["path"].
"""

def case_images_abspath(level: Union[Dict, List]):
"""Recursively set path to abs_path for all path_key items in lists in dicts in level."""
for sub_level_key, sub_level in level.items():
if isinstance(sub_level, dict):
case_images_abspath(sub_level)
elif isinstance(sub_level, list):
for image in sub_level:
image_path = image.get(path_key)
if image_path and os.path.exists(image_path):
image[path_key] = os.path.abspath(image_path)

if case_data.get(nested_file_key):
case_obj[nested_file_key] = case_data.get(nested_file_key)
case_images_abspath(case_obj[nested_file_key])


def build_case(case_data, adapter):
"""Build a case object that is to be inserted to the database

Expand Down Expand Up @@ -278,24 +332,22 @@ def build_case(case_data, adapter):
# Files
case_obj["madeline_info"] = case_data.get("madeline_info")

case_obj["custom_images"] = case_data.get("custom_images")
set_abspath_case_nested_image_files(case_obj, case_data)

set_abspath_case_file(case_obj, case_data, "delivery_report")

for report_key in [report.get("key_name") for report in CUSTOM_CASE_REPORTS.values()]:
if report_key in case_data:
case_obj[report_key] = case_data.get(report_key)
set_abspath_case_file(case_obj, case_data, report_key)

case_obj["vcf_files"] = case_data.get("vcf_files", {})
case_obj["delivery_report"] = case_data.get("delivery_report")
set_abspath_case_nested_files(case_obj, case_data, "vcf_files")

_populate_pipeline_info(case_obj, case_data)

case_obj["has_svvariants"] = bool(
case_obj["vcf_files"].get("vcf_sv") or case_obj["vcf_files"].get("vcf_sv_research")
)

case_obj["has_strvariants"] = bool(case_obj["vcf_files"].get("vcf_str"))

case_obj["has_meivariants"] = bool(case_obj["vcf_files"].get("vcf_mei"))
vcf_files = case_obj.get("vcf_files") or {}
case_obj["has_svvariants"] = bool(vcf_files.get("vcf_sv") or vcf_files.get("vcf_sv_research"))
case_obj["has_strvariants"] = bool(vcf_files.get("vcf_str"))
case_obj["has_meivariants"] = bool(vcf_files.get("vcf_mei"))

case_obj["is_migrated"] = False

Expand Down
49 changes: 46 additions & 3 deletions scout/build/individual.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
BUILD_INDIVIDUAL_FILES = [
"bam_file",
"d4_file",
"mitodel_file",
"mt_bam",
"rhocall_bed",
"rhocall_wig",
Expand All @@ -21,6 +22,43 @@
]


def set_abspath_individual_file(ind_obj: dict, ind: dict, ind_file: str):
"""Fix absolute path for individual files to be served from application.
This takes care of incomplete path for demo files. While most endpoints would attempt to make an
abs path when sending, storing them as absolute if we can access them on cli load ensures that
we can still find them from the web app later, even if that happens to be started in another working directory.

This may close a loophole for some very particular use cases, but in general should be safer.
"""

file_path = ind.get(ind_file)
if file_path and os.path.exists(file_path):
ind_obj[ind_file] = os.path.abspath(file_path)
else:
ind_obj[ind_file] = None


def set_abspath_nested_individual_files(ind_obj: dict, ind: dict, nested_file_key: str):
"""Fix absolute path for nested files to be served from application.
For some of our more complicated nesting, e.g. Chromograph, the file endings are generalised lat (in js),
and only a template is stored on the individual object. We then still wish to update the dirname, treating the
basename lightly as a template without checking for file existence just yet. The endpoints will handle that later.
"""
if ind.get(nested_file_key):
ind_obj[nested_file_key] = ind.get(nested_file_key)
for nested_file_item in ind_obj[nested_file_key]:
if nested_file_item:
if os.path.exists(nested_file_item):
ind_obj[nested_file_key][nested_file_item] = os.path.abspath(nested_file_item)
continue

nested_file_item_dirname = os.path.dirname(nested_file_item)
nested_file_item_basename = os.path.basename(nested_file_item)
ind_obj[nested_file_key][nested_file_item] = (
os.path.abspath(nested_file_item_dirname) + "/" + nested_file_item_basename
)


def build_individual(ind: dict) -> dict:
"""Build an Individual object

Expand Down Expand Up @@ -91,8 +129,9 @@ def build_individual(ind: dict) -> dict:
except KeyError as err:
raise (PedigreeError("Unknown phenotype: %s" % phenotype))

# Fix absolute path for individual bam files (takes care of incomplete path for demo files)
for ind_file in BUILD_INDIVIDUAL_FILES:
set_abspath_individual_file(ind_obj, ind, ind_file)

file_path = ind.get(ind_file)
if file_path and os.path.exists(file_path):
ind_obj[ind_file] = os.path.abspath(file_path)
Expand All @@ -105,10 +144,14 @@ def build_individual(ind: dict) -> dict:
ind_obj["confirmed_sex"] = ind.get("confirmed_sex")
ind_obj["confirmed_parent"] = ind.get("confirmed_parent")
ind_obj["predicted_ancestry"] = ind.get("predicted_ancestry")

ind_obj["mitodel"] = ind.get("mitodel")

ind_obj["chromograph_images"] = ind.get("chromograph_images")
ind_obj["reviewer"] = ind.get("reviewer")
ind_obj["mitodel"] = ind.get("mitodel")
ind_obj["mitodel_file"] = ind.get("mitodel_file")

for nested_file_key in "chromograph_images", "reviewer":
set_abspath_nested_individual_files(ind_obj, ind, nested_file_key)

# Check if the analysis type is ok
analysis_type = ind.get("analysis_type", "unknown")
Expand Down
2 changes: 1 addition & 1 deletion scripts/update_files_path.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
from pymongo import MongoClient
from tabulate import tabulate

from scout.build.individual import BUILD_INDIVIDUAL_FILES as INDIVIDUAL_FILES
from scout.constants import FILE_TYPE_MAP

VCF_FILES = FILE_TYPE_MAP.keys()
INDIVIDUAL_FILES = ["bam_file", "mt_bam", "vcf2cytosure"]


@click.command()
Expand Down
6 changes: 3 additions & 3 deletions tests/build/test_build_case.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# -*- coding: utf-8 -*-

import pytest
from pprint import pprint as pp
from scout.exceptions import PedigreeError, ConfigError, IntegrityError

from scout.build import build_case
from scout.exceptions import ConfigError, IntegrityError, PedigreeError


def test_build_case(parsed_case, adapter, institute_obj, testpanel_obj):
Expand Down Expand Up @@ -54,7 +54,7 @@ def test_build_case(parsed_case, adapter, institute_obj, testpanel_obj):

assert case_obj["madeline_info"] == parsed_case["madeline_info"]

assert case_obj["delivery_report"] == parsed_case["delivery_report"]
assert parsed_case["delivery_report"] in case_obj["delivery_report"]

for vcf in case_obj["vcf_files"]:
assert vcf in parsed_case["vcf_files"]
Expand Down
Loading