Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store demo case file paths as absolute paths #4517

Merged
merged 21 commits into from
Mar 26, 2024

Conversation

northwestwitch
Copy link
Member

@northwestwitch northwestwitch commented Mar 20, 2024

This PR adds a functionality or fixes a bug.

Testing on cg-vm1 server (Clinical Genomics Stockholm)

Prepare for testing

  1. Make sure the PR is pushed and available on Docker Hub
  2. Fist book your testing time using the Pax software available at https://pax.scilifelab.se/. The resource you are going to call dibs on is scout-stage and the server is cg-vm1.
  3. ssh <USER.NAME>@cg-vm1.scilifelab.se
  4. sudo -iu hiseq.clinical
  5. ssh localhost
  6. (optional) Find out which scout branch is currently deployed on cg-vm1: podman ps
  7. Stop the service with current deployed branch: systemctl --user stop scout.target
  8. Start the scout service with the branch to test: systemctl --user start scout@<this_branch>
  9. Make sure the branch is deployed: systemctl --user status scout.target
  10. After testing is done, repeat procedure at https://pax.scilifelab.se/, which will release the allocated resource (scout-stage) to be used for testing by other users.
Testing on hasta server (Clinical Genomics Stockholm)

Prepare for testing

  1. ssh <USER.NAME>@hasta.scilifelab.se
  2. Book your testing time using the Pax software. us; paxa -u <user> -s hasta -r scout-stage. You can also use the WSGI Pax app available at https://pax.scilifelab.se/.
  3. (optional) Find out which scout branch is currently deployed on cg-vm1: conda activate S_scout; pip freeze | grep scout-browser
  4. Deploy the branch to test: bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_scout -t scout -b <this_branch>
  5. Make sure the branch is deployed: us; scout --version
  6. After testing is done, repeat the paxa procedure, which will release the allocated resource (scout-stage) to be used for testing by other users.

How to test:

  1. Locally, run scout setup demo using main branch and this branch
  2. Check on MongoDB compass that file paths for demo cases are all absolute paths (on main branch they are relative paths)

Expected outcome:
The functionality should be working
Take a screenshot and attach or copy/paste the output.

Review:

  • code approved by DN
  • tests executed by CR, DN

@northwestwitch northwestwitch changed the title Store demo file paths as absolute paths Store demo case file paths as absolute paths Mar 20, 2024
@northwestwitch
Copy link
Member Author

northwestwitch commented Mar 20, 2024

Main branch:

image

This branch:

image

Copy link

codecov bot commented Mar 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.68%. Comparing base (8ceb4df) to head (10a3a36).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4517      +/-   ##
==========================================
+ Coverage   84.63%   84.68%   +0.04%     
==========================================
  Files         310      310              
  Lines       18598    18612      +14     
==========================================
+ Hits        15741    15761      +20     
+ Misses       2857     2851       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@northwestwitch northwestwitch marked this pull request as ready for review March 20, 2024 09:44
@dnil
Copy link
Collaborator

dnil commented Mar 20, 2024

I see the will here, and it is good, but this does not fully solve either #4399 (it starts it) or #1394, while it conflicts with #4388, plus its a duplication of effort which we have previous little time for. Let me eat something first, but Im currently not a fan. 😆

@northwestwitch
Copy link
Member Author

northwestwitch commented Mar 20, 2024

I see the will here, and it is good, but this does not fully solve either #4399 (it starts it) or #1394, while it conflicts with #4388, plus its a duplication of effort which we have previous little time for. Let me eat something first, but Im currently not a fan.

I don't really understand why we need to refactor case build and case loader (#4399) just to fix the fact that demo case doesn't have absolute but relative paths to the files? The purpose of pydantic is to validate and customise fields and this is just the case.
I think this is a simpler solution to the problem here. I also understood from the discussion in your PR that you agreed in a solution like this, instead of the code present in your PR? Perhaps I misunderstood?

@dnil
Copy link
Collaborator

dnil commented Mar 20, 2024

This is not just about the demo case. The problem is most obvious there, since all new devs/users encounter it. It affects all use of relative paths on loading. Some of these are being checked for at various places in the code base, with varying success.

I do not mind seeing a pydantic class fixing the issue, it is great for it: we just need to make the abstraction clear, and move over other similar tasks from build to it as well so one knows where to look for them. Adding one more task to what is currently primarily the pydantic load validator makes its role more confusing. Is it for validation? For initial load conversion to db? Or perhaps for reading db documents for display? As we discussed in #4388.

We can refactor it, but I don't see why we should do all that refactoring before merging the first PR? And I dont see why we should start by solving a part of the original issue with this PR?

@northwestwitch
Copy link
Member Author

northwestwitch commented Mar 20, 2024

This is not just about the demo case. The problem is most obvious there, since all new devs/users encounter it. It affects all use of relative paths on loading. Some of these are being checked for at various places in the code base, with varying success.

I do not mind seeing a pydantic class fixing the issue, it is great for it: we just need to make the abstraction clear, and move over other similar tasks from build to it as well so one knows where to look for them. Adding one more task to what is currently primarily the pydantic load validator makes its role more confusing. Is it for validation? For initial load conversion to db? Or perhaps for reading db documents for display? As we discussed in #4388.

We can refactor it, but I don't see why we should do all that refactoring before merging the first PR? And I dont see why we should start by solving a part of the original issue with this PR?

Mmm sorry but I don't really like the approach of PR #4388 to fix the issue. I agree with you that having the pydantic validation plus the case build is confusing and stuff might be refactored to be more clear. This probably comes from the fact that we had many checks in build before we switch to pydantic, but since now we have pydantic I think we should make the. most of it. In current main there is a method that already checks if the paths are valid. So instead of having one hundred lines of extra code to fix the paths outside the validation, you can have just 2 in that pydanic function that do just the same? It's way more easy to maintain as well I think?

@northwestwitch
Copy link
Member Author

I think in these case we might solve the impasse like this: we leave both PRs open and we see who gets tired to see them laying around and approves one first 🤣

@dnil
Copy link
Collaborator

dnil commented Mar 20, 2024

Right, neither is the solution we would like in the long run, and we could argue for a long time on wether its better to have something that solves the problem, and how complete an abstraction should be before introducing it. I think this is a multiplication of effort for no reason other than a realisation about a possible future path, but since you made it, I will review it and ask that it at least satisfies solving #1394. Once it works, we could then make sure to solve #4399 before the next release.

Copy link
Collaborator

@dnil dnil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks convincing on VCF, but still does not fix e.g. custom images path. If you want, you could step through the changes in #4388 for the places I know of.

Screenshot 2024-03-20 at 15 44 22

I haven't checked all places downstream that attempt to make an abs path for redundancy. I guess we may still want several of them to cover for old relative paths in the db.

The CaseLoader class now does part of what build_case() did before. I think the division you sketch is really good, but there are several equivalent sub functionalities left in build_case() that should move to CaseLoader to keep us from looking back and forth between the two.

Abstraction wise, we would want things that only needs to be done when parsing the config file in CaseLoader, and they should as per parsing in general also preferably not require the use of other db resources. Build would then handle the additional resources, and not have to worry about file integrity or syntax. We traditionally reserve dual use for the build function, so that they could sometimes be called when storing back items from prepped display object to db, but this is afaik not used for case objects - we only do field-by-field updates on them live.

@dnil dnil removed the Easy label Mar 20, 2024
@northwestwitch
Copy link
Member Author

@dnil I've fixed paths to images (cases and vars), chromograph and reviewer. Comparing with the demo case config it looks like that's all

@dnil
Copy link
Collaborator

dnil commented Mar 21, 2024

@dnil I've fixed paths to images (cases and vars), chromograph and reviewer. Comparing with the demo case config it looks like that's all

At the seminar now but will check later!

Copy link
Collaborator

@dnil dnil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this should now close #1394 afaik, so I approve. I guess we could say it starts #4399 as well, but there are other simpler value changes in build. They can also be determined by just looking at the config file (e.g. variable type casts, booleans for valid config values (like has_svvariants) and should go with the CaseLoader when allowing it to change values. But we can do them after.

item_path: str = values.get(item)
if (
item_path
): # it is an incomplete path -> replace the path to the directory with the abs path to the directory
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move the comments to the docstring? There is only one thing happening here really, if the key exists on the obj. Using more descriptive variable names would also alleviate the need to comment the code. 😊

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

scout/models/case/case_loading_models.py Outdated Show resolved Hide resolved
scout/models/case/case_loading_models.py Outdated Show resolved Hide resolved
scout/models/case/case_loading_models.py Outdated Show resolved Hide resolved
scout/models/case/case_loading_models.py Outdated Show resolved Hide resolved
item_path_dirname: str = dirname(
item_path
) # path to the directory containing multiple files
values[item] = item_path.replace(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like here.. values[item] could be made into a short essay describing what they represent..

Suggested change
values[item] = item_path.replace(
config_values[chromograph_path_item] = item_path.replace(

Copy link
Member Author

@northwestwitch northwestwitch Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I have replaced the loop in the model check with for key in cls.model_fields:, so I think it's clear that the keys used for collecting the values are the model fields!

scout/models/case/case_loading_models.py Show resolved Hide resolved
tests/parse/test_parse_case.py Show resolved Hide resolved
scout/models/case/case_loading_models.py Outdated Show resolved Hide resolved
scout/models/case/case_loading_models.py Outdated Show resolved Hide resolved
return False
def _resource_abs_path(string_path: str) -> str:
"""Return the absolute path to a resource file."""
if exists(string_path) and isabs(string_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could flatten this conditional as

Suggested change
if exists(string_path) and isabs(string_path):
if not exists(string_path):
raise FileExistsError(f"Path {string_path} could not be found on disk.")
if isabs(string_path):
return string_path
return abspath(string_path)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Copy link

sonarcloud bot commented Mar 26, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@northwestwitch
Copy link
Member Author

I tested it again locally and seems to parse stuff fine. Merging when tests pass. Thanks for the useful review @dnil!

@northwestwitch northwestwitch merged commit 8c3dadc into main Mar 26, 2024
20 checks passed
@dnil dnil deleted the pydantic_fix_case_paths branch March 26, 2024 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants