Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: cohort extraction on latest GWAS Catalog release files #521

Closed

Conversation

RobinM-code
Copy link
Contributor

✨ Context

I tried to run gwas_catalog_ingestion from the gentropy package. It seems that this step does not work with the current GWAS Catalog files. (I downloaded the catalog_study_files, catalog_ancestry_files and catalog_associations_file from here: https://ftp.ebi.ac.uk/pub/databases/gwas/releases/2024/03/04/)

To be more specific, the COHORT(S) header was removed from the catalog_ancestry_files. However, it was placed back in the catalog_study_files as COHORT. See documentation from GWAS Catalog: https://www.ebi.ac.uk/gwas/docs/fileheaders#_file_headers_for_unpublished_ancestries

I believe @DSuveges has found this as well. See his 3rd task in this PR: #507

🛠 What does this PR implement

This PR makes sure the COHORT(S) are extracted from the catalog_study_files.
Sample files and pytests have been updated.
Note that I had to update pre-commit-config.yaml as well. (It updated while running make setup-dev. Not including the files made pre-commit run into errors)

🙈 Missing

I am not yet aware of the full gentropy package. Could there be any other steps that use the COHORT(S) header from the catalog_ancestry_files?

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@d0choa d0choa requested a review from DSuveges March 6, 2024 17:17
Copy link
Contributor

@DSuveges DSuveges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @RobinM-code, Thanks for your PR! Yes, indeed the GWAS Catalog recently made some changes in data export, which made some changes on our side necessary. Based on communication with the GWAS Catalog team, this change is planned and permanent. (We actually started to follow their announcement mailing list to keep up with any planned changes for this reasons)

Although I addressed this issue on my branch (except the update of the test data), I'll approve and merge this PR, because the PR you were referencing tries to deal with other problems I'm not fully ready with.

Thank you for your contribution, please keep looking into the code, try to run things and please feel free to contact us if you have question or raise issue if you find something off. We highly value any input.

Copy link
Contributor

@DSuveges DSuveges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately a pre-commit update change needs to be reverted

@@ -56,7 +56,7 @@ repos:
exclude: "CHANGELOG.md"

- repo: https://github.com/alessandrojcm/commitlint-pre-commit-hook
rev: v9.11.0
rev: v9.13.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following a discussion with @d0choa, this update needs to be reverted: as seen on the pre-commit forums and as experienced by us this update causes pre-commits fail.

On an other note, we were wondering if the pre-commit checks were run on your machine? There can be a divergence between the actual local environment and the remote dev branch leading to complications. So a regular make setup-dev is advised to make sure the local environment is always up to date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @DSuveges, I was already afraid this would cause some trouble. I am happy to revert, but as mentioned, when running make setup-dev this file is automatically updated. If I revert the updates pre-commit fails and I am unable to do any commits...

Trying to revert the specific commit:
image

Running make setup-dev: (notice the updating of ruff and commitlint)
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR merged today removes the poblematic row from the makefile. If you update your branch that should sort out these issues hopefully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @DSuveges,

I have updated our dev branch to include #524. I re-created the development environment by running make setup-dev but keep running into this error.

image

May I ask which node version you are using?

@louwenjjr louwenjjr mentioned this pull request Mar 11, 2024
9 tasks
gwas catalog v1.0.3.1 moved cohort from ancestry to study files. This
breaks the `annotate_ancestries` method in the `StudyIndexGWASCatalog`
class. Use the gwas catalog study files to get the cohorts instead.
@RobinM-code
Copy link
Contributor Author

Hi @DSuveges,

I have rebased this branch on dev. Additionally: #522 and #533 have helped me to get rid of the pre-commit errors I was facing.
Finally, I removed my updates to the test files since they have been updated by you in a7c919a

@ireneisdoomed ireneisdoomed changed the title Fix: cohort extraction on latest GWAS Catalog release files fix: cohort extraction on latest GWAS Catalog release files Mar 22, 2024
@DSuveges
Copy link
Contributor

DSuveges commented Jun 5, 2024

Closing this PR, as the original problem due to the column name changes in the GWAS Catalog datasets were already addressed.

@DSuveges DSuveges closed this Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants