Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance find_outliers and identify_outliers performance by avoiding duplication and filtering columns #140

Merged
merged 4 commits into from
Dec 10, 2024

Conversation

d33bs
Copy link
Member

@d33bs d33bs commented Nov 20, 2024

Description

This PR enhances find_outliers and identify_outliers performance by filtering to only the necessary columns and avoiding data duplication where possible. Along the path towards this PR I also analyzed code coverage and found we needed to clean up a bunch of legacy fixtures which are no longer used here (should boost the coverage percentage overall now as a result).

Closes #134
Closes #86

What kind of change(s) are included?

  • Documentation (changes docs or other related content)
  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

  • I have read and followed the CONTRIBUTING.md guidelines.
  • I have searched for existing content to ensure this is not a duplicate.
  • I have performed a self-review of these additions (including spelling, grammar, and related).
  • These changes pass all pre-commit checks.
  • I have added comments to my code to help provide understanding
  • I have added a test which covers the code changes found within this PR
  • I have deleted all non-relevant text in this pull request template.

@d33bs d33bs marked this pull request as ready for review November 20, 2024 15:44
Copy link
Member

@jenna-tomkinson jenna-tomkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉 Love this small change! Just curious, is this behavior going to be the default and the user can choose to load in all columns?

src/cosmicqc/analyze.py Show resolved Hide resolved
@d33bs
Copy link
Member Author

d33bs commented Dec 10, 2024

Thanks @jenna-tomkinson, merging this in!

@d33bs d33bs merged commit b3be22e into WayScience:main Dec 10, 2024
9 checks passed
@d33bs d33bs deleted the read-only-needed-columns branch December 10, 2024 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants