chore(l2g): log annotated gold standards in w&b #546

ireneisdoomed · 2024-03-15T15:10:59Z

✨ Context

This is the entry API to train the L2G model:

LocusToGeneTrainer.train(
                    gold_standard_data=data,
                    l2g_model=l2g_model,
                    model_path=model_path,
                    evaluate=True,
                    wandb_run_name=wandb_run_name,
                    **hyperparameters,
                )

Everything except the input gold_standard_data are just set parameters.
If we wanted to speed up experimenting as I suggest in opentargets/issues#3253, we would just need to store gold_standard_data.

We can use W&B for that. We currently upload to W&B the train set of the annotated gold standards. I want to log the full matrix before splitting into train, test. Since we use a seed to split the data, this still makes sure we can reproduce the pipeline.

I like this idea because it binds the input data to the output model.

🛠 What does this PR implement

Changes in LocusToGeneModel.evaluate to accomplish the suggestion above.

🙈 Missing

🚦 Before submitting

Do these changes cover one single feature (one change at a time)?
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes?
Did you make sure there is no commented out code in this PR?
Did you follow conventional commits standards in PR title and commit messages?
Did you make sure the branch is up-to-date with the dev branch?
Did you write any new necessary tests?
Did you make sure the changes pass local tests (make test)?
Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

DSuveges

It look OK. No obvious issues. Have you tried to run it?

ireneisdoomed · 2024-03-18T15:07:26Z

It's running right now. It shouldn't have issues because we were logging a very similar table (only the train set). I just changed the input to the logging function and refactored the train function

ireneisdoomed added 2 commits March 15, 2024 11:47

refactor(l2g): select features of interest in data outside trainer

73f9913

chore(l2g): log annotated gold standards in w&b

36fb4fe

github-actions bot added size-S Method Step Chore labels Mar 15, 2024

fix: update test_train

5fe9b4b

ireneisdoomed marked this pull request as ready for review March 15, 2024 15:42

ireneisdoomed added 2 commits March 18, 2024 12:36

Merge branch 'dev' into il-log-l2g

7f84986

Merge branch 'dev' into il-log-l2g

8d2c943

DSuveges approved these changes Mar 18, 2024

View reviewed changes

DSuveges merged commit b3a5664 into dev Mar 18, 2024
4 checks passed

DSuveges deleted the il-log-l2g branch March 18, 2024 15:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(l2g): log annotated gold standards in w&b #546

chore(l2g): log annotated gold standards in w&b #546

ireneisdoomed commented Mar 15, 2024 •

edited

Loading

DSuveges left a comment

ireneisdoomed commented Mar 18, 2024

chore(l2g): log annotated gold standards in w&b #546

chore(l2g): log annotated gold standards in w&b #546

Conversation

ireneisdoomed commented Mar 15, 2024 • edited Loading

✨ Context

🛠 What does this PR implement

🙈 Missing

🚦 Before submitting

DSuveges left a comment

Choose a reason for hiding this comment

ireneisdoomed commented Mar 18, 2024

ireneisdoomed commented Mar 15, 2024 •

edited

Loading