Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(l2g): log annotated gold standards in w&b #546

Merged
merged 5 commits into from
Mar 18, 2024
Merged

chore(l2g): log annotated gold standards in w&b #546

merged 5 commits into from
Mar 18, 2024

Conversation

ireneisdoomed
Copy link
Contributor

@ireneisdoomed ireneisdoomed commented Mar 15, 2024

✨ Context

This is the entry API to train the L2G model:

LocusToGeneTrainer.train(
                    gold_standard_data=data,
                    l2g_model=l2g_model,
                    model_path=model_path,
                    evaluate=True,
                    wandb_run_name=wandb_run_name,
                    **hyperparameters,
                )

Everything except the input gold_standard_data are just set parameters.
If we wanted to speed up experimenting as I suggest in opentargets/issues#3253, we would just need to store gold_standard_data.

We can use W&B for that. We currently upload to W&B the train set of the annotated gold standards. I want to log the full matrix before splitting into train, test. Since we use a seed to split the data, this still makes sure we can reproduce the pipeline.

I like this idea because it binds the input data to the output model.

🛠 What does this PR implement

Changes in LocusToGeneModel.evaluate to accomplish the suggestion above.

🙈 Missing

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@ireneisdoomed ireneisdoomed marked this pull request as ready for review March 15, 2024 15:42
Copy link
Contributor

@DSuveges DSuveges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It look OK. No obvious issues. Have you tried to run it?

@ireneisdoomed
Copy link
Contributor Author

It's running right now. It shouldn't have issues because we were logging a very similar table (only the train set). I just changed the input to the logging function and refactored the train function

@DSuveges DSuveges merged commit b3a5664 into dev Mar 18, 2024
4 checks passed
@DSuveges DSuveges deleted the il-log-l2g branch March 18, 2024 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants