Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(l2g): remove custom session params + other fixes #841

Merged
merged 5 commits into from
Oct 14, 2024

Conversation

ireneisdoomed
Copy link
Contributor

@ireneisdoomed ireneisdoomed commented Oct 14, 2024

✨ Context

@d0choa observed a Java memory issue when running the L2G training

Caused by: [CIRCULAR REFERENCE: org.apache.spark.SparkException: Cannot broadcast the table that is larger than 8GB: 13 GB]

(more here)

The L2G training step had a custom session that provided more memory to the cluster. This was inherited from the previous L2G version where this step included the feature matrix generation and the training was happening in Spark.
Because L2G is now a much lighter step, the default spark configuration should be enough.

🛠 What does this PR implement

  • Removal of the session parameter in the LocusToGeneStepConfig that extended the spark configuration. I am still defining the default session inside the step configuration, because otherwise I'd need to provide it if I want to access any step attribuites. This is because the child step inherits from the Step class, which requires a session. For example:
# if LocusToGeneConfig doesn't include session

LocusToGeneConfig().features_list
>> TypeError: LocusToGeneConfig.__init__() missing 1 required positional argument: 'session'

# default session values for LocusToGeneConfig
@dataclass
class LocusToGeneConfig(StepConfig):
    """Locus to gene step configuration."""

    session: Any = field(default_factory=lambda: {"extended_spark_conf": None})

LocusToGeneConfig().features_list
>> ['eQtlColocClppMaximum',
 'pQtlColocClppMaximum',
 'sQtlColocClppMaximum',
 'eQtlColocH4Maximum',
...]
  • Minor issues: addition of the colocalisation neighbourhood features to the feature matrix step

🙈 Missing

Didn't run the whole L2G step (run into an error that #837 fixes), but the crash wasn't due to Java.

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@github-actions github-actions bot added bug Something isn't working size-S Step labels Oct 14, 2024
@ireneisdoomed ireneisdoomed marked this pull request as ready for review October 14, 2024 15:09
@ireneisdoomed ireneisdoomed merged commit 6817aad into dev Oct 14, 2024
5 checks passed
@ireneisdoomed ireneisdoomed deleted the il-l2g-session branch October 14, 2024 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working size-S Step
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants