Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove GNU parallel dependency #33

Merged
merged 3 commits into from
Jan 30, 2023
Merged

Conversation

semenko
Copy link

@semenko semenko commented Jan 19, 2023

Use bash builtins instead of GNU parallel.

This is slightly faster than the parallel implementation.

Output identical, compared with diff -bur ../biscuit-qc/ ../biscuit-qc-patch-1/

This removes parallel in favor of bash builtins.

Performance is similar/mildly faster.
@semenko
Copy link
Author

semenko commented Jan 30, 2023

Hey @jamorrison is it useful for me to updated this — or should I close this PR?

(Don't see a good reason to use parallel at all here.)

@jamorrison
Copy link

@semenko I think this was a good update. I noticed this after I started the release process for BISCUIT v1.2.0 though, so it'll go into the next release of BISCUIT. Thanks for the PR!

@jamorrison jamorrison merged commit 960ddaf into huishenlab:master Jan 30, 2023
@semenko
Copy link
Author

semenko commented Jan 30, 2023

Thanks! Does this latest version (1.2.0) auto-push to bioconda? (Exciting changes!)

@jamorrison
Copy link

I have a PR in testing for bioconda at the moment. Should be available in a day or two, review and approval dependent.

@jamorrison
Copy link

Hi @semenko The latest version of biscuit (1.2.0) is available on bioconda now.

@bounlu
Copy link

bounlu commented Jun 7, 2023

Hi,

I am curious to know why bash builtin background processing should be faster than GNU parallel?

If not slower, I would still go with GNU parallel as it has better capabilities and handling with parallel execution although this is not very much applicable in this case.

@semenko
Copy link
Author

semenko commented Jun 7, 2023

For a few reasons, parallel was forcing extra work. For example this:

https://github.com/huishenlab/biscuit/pull/33/files#diff-03f6544f78af96f7e16c293b67a23d876cd5a1d585d61658675c2b282a6c552dL136
image

parallel was running samtools view, then we running it again multiple times for quality filters (and other options) not implemented by bedtools commands.

Plus no need for extra dependencies :P

@bounlu
Copy link

bounlu commented Jun 7, 2023

Ahh I see, sorry that I missed the extra code. This is because parallel requires an input sequence to work with. However, this can be tricked like here to make it equivalent to background processing you implemented.

And I would not really call awk and parallel as extra dependencies as they are almost always built-in ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants