Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRY refactoring #60

Merged
merged 35 commits into from
May 23, 2023
Merged

DRY refactoring #60

merged 35 commits into from
May 23, 2023

Conversation

charles-cowart
Copy link
Contributor

Goal is to refactor code so that there is minimal duplication, easily extensible to enable metatranscriptomics and future Assay types, and breakout all of the functionality from the main plugin function so they can be tested properly.

Goal is to refactor code so that there is minimal duplication, easily
extensible to enable metatranscriptomics and future Assay types, and
breakout all of the functionality from the main plugin function so they
can be tested properly.
@charles-cowart charles-cowart requested a review from antgonza May 12, 2023 17:19
@charles-cowart charles-cowart changed the base branch from main to dev May 18, 2023 17:14
Copy link
Member

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @charles-cowart, I have some concerns about the possibility of duplication of code; please let me know what you think.

.gitignore Show resolved Hide resolved
qp_klp/AmpliconStep.py Outdated Show resolved Hide resolved
qp_klp/AmpliconStep.py Outdated Show resolved Hide resolved
qp_klp/AmpliconStep.py Outdated Show resolved Hide resolved
qp_klp/AmpliconStep.py Outdated Show resolved Hide resolved
qp_klp/MetagenomicStep.py Outdated Show resolved Hide resolved
qp_klp/klp.py Outdated Show resolved Hide resolved
qp_klp/tests/test_amplicon_step.py Outdated Show resolved Hide resolved
qp_klp/tests/test_amplicon_step.py Outdated Show resolved Hide resolved
qp_klp/tests/test_amplicon_step.py Outdated Show resolved Hide resolved
@charles-cowart
Copy link
Contributor Author

@antgonza I've pushed a commit of quick updates based on your feedback, thanks! I haven't responded to all of your concerns yet, but I will shortly.

Copy link
Member

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comments before. Now, a general question, if you are "extracting" the general functions from Amplicon/Metagenomic Step and adding them to the base class Step, does it really make sense to have a klp_util.py file? I mean basically you have to access method X in Amplicon/Metagenomic Step, that will call X method in Step and then do X in klp_util.py; no? Would it be better to remove one step and just go either Amplicon/Metagenomic Step -> klp_util.py or Amplicon/Metagenomic Step -> Step and remove klp_util.py?

.gitignore Show resolved Hide resolved
qp_klp/AmpliconStep.py Outdated Show resolved Hide resolved
qp_klp/AmpliconStep.py Outdated Show resolved Hide resolved
qp_klp/tests/test_amplicon_step.py Outdated Show resolved Hide resolved
Removed configuration.json.
Additional updates based on testing and feedback.
@charles-cowart
Copy link
Contributor Author

Thank you for the comments before. Now, a general question, if you are "extracting" the general functions from Amplicon/Metagenomic Step and adding them to the base class Step, does it really make sense to have a klp_util.py file? I mean basically you have to access method X in Amplicon/Metagenomic Step, that will call X method in Step and then do X in klp_util.py; no? Would it be better to remove one step and just go either Amplicon/Metagenomic Step -> klp_util.py or Amplicon/Metagenomic Step -> Step and remove klp_util.py?

We're definitely on the same page. The idea was to incorporate klp_util.py into the base Step class. I left them as is for the time being, just because so many other things had changed already.

@antgonza
Copy link
Member

We're definitely on the same page. The idea was to incorporate klp_util.py into the base Step class. I left them as is for the time being, just because so many other things had changed already.

I think this will be the only blocker before merging this initial PR and this would be the base for future ones, could you do this now before waiting for a better time?

Copy link
Member

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions.

qp_klp/Amplicon.py Outdated Show resolved Hide resolved
qp_klp/Amplicon.py Outdated Show resolved Hide resolved
qp_klp/Step.py Outdated Show resolved Hide resolved
qp_klp/Metagenomic.py Outdated Show resolved Hide resolved
qp_klp/Metagenomic.py Outdated Show resolved Hide resolved
qp_klp/klp.py Outdated Show resolved Hide resolved
Copy link
Member

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments.

Comment on lines +74 to +77
job_output = [join(raw_fastq_files_path, x) for x in
listdir(raw_fastq_files_path)]
job_output = [x for x in job_output if isfile(x) and x.endswith(
'fastq.gz') and not basename(x).startswith('Undetermined')]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these 2 can be merged, no?

Copy link
Contributor Author

@charles-cowart charles-cowart May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to keep them separate, because you want x in isfile(x) in the second one to be the full path created by join(raw_fastq_files_path, x) in the first one and if we merged them that would mean we'd have something like:

job_output = [join(raw_fastq_files_path, x) for x in listdir(raw_fastq_files_path) if isfile(join(raw_fastq_files_path, x)) and x.endswith('fastq.gz') and not basename(x).startswith('Undetermined')]]

The join() has to be present twice.

def generate_prep_file(self):
config = self.pipeline.configuration['seqpro']

seqpro_path = config['seqpro_path'].replace('seqpro', 'seqpro_mf')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the plan to remove seqpro_mf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an easy enough change on this side of things, and a little bit of refactoring for metapool package. I can make the change by the end of the week in between other things.

config = self.pipeline.configuration['bcl-convert']
job = super()._convert_bcl_to_fastq(config,
self.pipeline.sample_sheet.path)
self.fsr.write(job.audit(self.pipeline.get_sample_ids()), 'ConvertJob')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like at the end of each method something similar to this line is called, should it be more general? I mean something like self.failed_samples_audit('ConvertJob') where that calls something like self.write(job.audit(self.pipeline.get_sample_ids()), audit_name)

Copy link
Contributor Author

@charles-cowart charles-cowart May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this out, but it seems like it will just create another function something like the following:
def _failed_samples_audit(self, job, step_name):
self.fsr.write(job.audit(self.pipeline.get_sample_ids()), step_name)

It doesn't really do much. The meat of each audit is in each Job class, while the structure and writing of each failed samples record is in the FailedSamplesRecord class.

qp_klp/Step.py Outdated Show resolved Hide resolved
qp_klp/klp.py Outdated Show resolved Hide resolved
qp_klp/klp.py Outdated Show resolved Hide resolved
qp_klp/tests/test_amplicon_step.py Outdated Show resolved Hide resolved
Copy link
Member

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments.

@charles-cowart charles-cowart changed the title WIP DRY refactoring DRY refactoring May 23, 2023
@antgonza antgonza merged commit e26c59b into qiita-spots:dev May 23, 2023
antgonza added a commit that referenced this pull request May 26, 2023
* DRY refactoring (#60)

* WIP DRY refactoring

Goal is to refactor code so that there is minimal duplication, easily
extensible to enable metatranscriptomics and future Assay types, and
breakout all of the functionality from the main plugin function so they
can be tested properly.

* Forgot to add new Step files to DRY refactor.

Some Additional modifications added.

* Laying out new test classes

* Added support for base Step tests

* Ensure output_dir is created

* bugfix

* Temporarily set mg-scripts dependency to development version

* .gitignore fix

* First test w/pseudo job-submission added.

* CI Fixes

* Fix for CI

* debug CI.

test works as intended locally

* debugging CI

* ci debug

* ci debug

* ci debug

* Added test for base QCJob

* Added MetagenomicStep tests

* Updated testing infrastructure

* Added AmpliconStep tests

* flake8

* setup now points to merged mg-scripts.

* Updated code to create fake files in a searchable path.

* Bugfix

* flake8

* Easy updates based on feedback

* Removed configuration.json

Removed configuration.json.
Additional updates based on testing and feedback.

* Merged klp_util.py into Step and klp.py based on feedback

* Removed test_klp_util.py

* Updates based on testing

* Updates based on feedback

* Changes based on feedback

* Add new file

* Changes based on feedback

* bugfix

* Testing updates (#61)

* Updates based on testing on qiita-rc

* Updates based on feedback

* Updates based on feedback

* Removed sn_tid_map_by_project as a constructor parameter for Step()

* flake8

* Updates based on feedback

* Added test for get_special_map()

* Added test_get_tube_ids_from_qiita()

* Split BaseStepTests into two halves.

Tests in BaseStepTests were being inherited by child BaseStepTests and
run two more times (one for Amplicon and one for Metagenomic).
Partitioning out general tests into a new child class allows all
children to inherit configuration, but not inherit tests.

* Updates based on feedback

* Updates based on feedback

* Migrated pipeline calls into a single function

Migrated pipeline calls into a single function to preserve their
sequence, offer potential alternatives, and include a debug option to
not update Qiita.

* Adding coveralls to project

* bugfix

* test coveralls

* Updating coveralls

* Testing codecov integration

* testing codecov

* Testing codecov

---------

Co-authored-by: Charles Cowart <42684307+charles-cowart@users.noreply.github.com>
charles-cowart added a commit that referenced this pull request Jun 6, 2023
This code was originally reviewed and approved as part of the following PRs:
#60
#61

Code refactored to be more extensible to new Assay types.
Also, monolithic unit-tests that required testing on Qiita-RC were removed.
New unittests are more numerous and test smaller self-contained functionality.
Qiita is no longer needed for testing; FakeQiita class is now used to emulate canned Qiita API queries.
Job submission and SLURM responses are also emulated w/fake binaries.
charles-cowart added a commit that referenced this pull request Jun 7, 2023
This code was originally reviewed and approved as part of the following PRs:
#60
#61

Code refactored to be more extensible to new Assay types.
Also, monolithic unit-tests that required testing on Qiita-RC were removed.
New unittests are more numerous and test smaller self-contained functionality.
Qiita is no longer needed for testing; FakeQiita class is now used to emulate canned Qiita API queries.
Job submission and SLURM responses are also emulated w/fake binaries.
charles-cowart added a commit that referenced this pull request Jun 7, 2023
This code was originally reviewed and approved as part of the following PRs:
#60
#61

Code refactored to be more extensible to new Assay types.
Also, monolithic unit-tests that required testing on Qiita-RC were removed.
New unittests are more numerous and test smaller self-contained functionality.
Qiita is no longer needed for testing; FakeQiita class is now used to emulate canned Qiita API queries.
Job submission and SLURM responses are also emulated w/fake binaries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants