-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify when chunking files #495
Comments
I was surprised to see chunking happening in sourmash... Turns out it's coming from I'm guessing you're using
will create the index in (I need to document this better, but the |
And you can also use
|
(pinging @taylorreiter for more large scale |
Other than
I have a few other tricks for dealing with the very large compare matrix that is output by this, but those are pretty use-case specific. |
Oh interesting! I'll use For |
For the large compare matrix, do you end up saving it as a sparse matrix? |
Hi @olgabot! I added '--traverse-directory' as an option for 'compare' last week if you'd like to give it a try. |
I think |
re sparse matrix, no, I have not experimented with that. I was comparing tetramernucleotide frequency of each contig across a euk genome that was highly fragmented. I ended up doing compare on a subset instead, so 1/4 at a time. Saving it as a csv is also a bad idea when dealing many comparisons. Lastly, @luizirber has recommended dask before. |
Thanks, everyone! |
Updates the requirements on [pytest-cov](https://github.com/pytest-dev/pytest-cov) to permit the latest version. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst">pytest-cov's changelog</a>.</em></p> <blockquote> <h2>4.0.0 (2022-09-28)</h2> <p><strong>Note that this release drops support for multiprocessing.</strong></p> <ul> <li> <p><code>--cov-fail-under</code> no longer causes <code>pytest --collect-only</code> to fail Contributed by Zac Hatfield-Dodds in <code>[#511](pytest-dev/pytest-cov#511) <https://github.com/pytest-dev/pytest-cov/pull/511></code>_.</p> </li> <li> <p>Dropped support for multiprocessing (mostly because <code>issue 82408 <https://github.com/python/cpython/issues/82408></code>_). This feature was mostly working but very broken in certain scenarios and made the test suite very flaky and slow.</p> <p>There is builtin multiprocessing support in coverage and you can migrate to that. All you need is this in your <code>.coveragerc</code>::</p> <p>[run] concurrency = multiprocessing parallel = true sigterm = true</p> </li> <li> <p>Fixed deprecation in <code>setup.py</code> by trying to import setuptools before distutils. Contributed by Ben Greiner in <code>[#545](pytest-dev/pytest-cov#545) <https://github.com/pytest-dev/pytest-cov/pull/545></code>_.</p> </li> <li> <p>Removed undesirable new lines that were displayed while reporting was disabled. Contributed by Delgan in <code>[#540](pytest-dev/pytest-cov#540) <https://github.com/pytest-dev/pytest-cov/pull/540></code>_.</p> </li> <li> <p>Documentation fixes. Contributed by Andre Brisco in <code>[#543](pytest-dev/pytest-cov#543) <https://github.com/pytest-dev/pytest-cov/pull/543></code>_ and Colin O'Dell in <code>[#525](pytest-dev/pytest-cov#525) <https://github.com/pytest-dev/pytest-cov/pull/525></code>_.</p> </li> <li> <p>Added support for LCOV output format via <code>--cov-report=lcov</code>. Only works with coverage 6.3+. Contributed by Christian Fetzer in <code>[#536](pytest-dev/pytest-cov#536) <https://github.com/pytest-dev/pytest-cov/issues/536></code>_.</p> </li> <li> <p>Modernized pytest hook implementation. Contributed by Bruno Oliveira in <code>[#549](pytest-dev/pytest-cov#549) <https://github.com/pytest-dev/pytest-cov/pull/549></code>_ and Ronny Pfannschmidt in <code>[#550](pytest-dev/pytest-cov#550) <https://github.com/pytest-dev/pytest-cov/pull/550></code>_.</p> </li> </ul> <h2>3.0.0 (2021-10-04)</h2> <p><strong>Note that this release drops support for Python 2.7 and Python 3.5.</strong></p> <ul> <li>Added support for Python 3.10 and updated various test dependencies. Contributed by Hugo van Kemenade in <code>[#500](pytest-dev/pytest-cov#500) <https://github.com/pytest-dev/pytest-cov/pull/500></code>_.</li> <li>Switched from Travis CI to GitHub Actions. Contributed by Hugo van Kemenade in <code>[#494](pytest-dev/pytest-cov#494) <https://github.com/pytest-dev/pytest-cov/pull/494></code>_ and <code>[#495](pytest-dev/pytest-cov#495) <https://github.com/pytest-dev/pytest-cov/pull/495></code>_.</li> <li>Add a <code>--cov-reset</code> CLI option. Contributed by Danilo Šegan in <code>[#459](pytest-dev/pytest-cov#459) <https://github.com/pytest-dev/pytest-cov/pull/459></code>_.</li> <li>Improved validation of <code>--cov-fail-under</code> CLI option. Contributed by ... Ronny Pfannschmidt's desire for skark in <code>[#480](pytest-dev/pytest-cov#480) <https://github.com/pytest-dev/pytest-cov/pull/480></code>_.</li> <li>Dropped Python 2.7 support.</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pytest-dev/pytest-cov/commit/28db055bebbf3ee016a2144c8b69dd7b80b48cc5"><code>28db055</code></a> Bump version: 3.0.0 → 4.0.0</li> <li><a href="https://github.com/pytest-dev/pytest-cov/commit/57e9354a86f658556fe6f15f07625c4b9a9ddf53"><code>57e9354</code></a> Really update the changelog.</li> <li><a href="https://github.com/pytest-dev/pytest-cov/commit/56b810b91c9ae15d1462633c6a8a1b522ebf8e65"><code>56b810b</code></a> Update chagelog.</li> <li><a href="https://github.com/pytest-dev/pytest-cov/commit/f7fced579e36b72b57e14768026467e4c4511a40"><code>f7fced5</code></a> Add support for LCOV output</li> <li><a href="https://github.com/pytest-dev/pytest-cov/commit/1211d3134bb74abb7b00c3c2209091aaab440417"><code>1211d31</code></a> Fix flake8 error</li> <li><a href="https://github.com/pytest-dev/pytest-cov/commit/b077753f5d9d200815fe500d0ef23e306784e65b"><code>b077753</code></a> Use modern approach to specify hook options</li> <li><a href="https://github.com/pytest-dev/pytest-cov/commit/00713b3fec90cb8c98a9e4bfb3212e574c08e67b"><code>00713b3</code></a> removed incorrect docs on <code>data_file</code>.</li> <li><a href="https://github.com/pytest-dev/pytest-cov/commit/b3dda36fddd3ca75689bb3645cd320aa8392aaf3"><code>b3dda36</code></a> Improve workflow with a collecting status check. (<a href="https://github-redirect.dependabot.com/pytest-dev/pytest-cov/issues/548">#548</a>)</li> <li><a href="https://github.com/pytest-dev/pytest-cov/commit/218419f665229d61356f1eea3ddc8e18aa21f87c"><code>218419f</code></a> Prevent undesirable new lines to be displayed when report is disabled</li> <li><a href="https://github.com/pytest-dev/pytest-cov/commit/60b73ec673c60942a3cf052ee8a1fdc442840558"><code>60b73ec</code></a> migrate build command from distutils to setuptools</li> <li>Additional commits viewable in <a href="https://github.com/pytest-dev/pytest-cov/compare/v2.12.0...v4.0.0">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Hello! I ran
sourmash compute
on ~50k single-cell (SmartSeq2 library prep) RNA-seq samples (they are here if you would like to see them:aws s3 ls s3://olgabot-maca/facs/sourmash/
) and wanted to index/compare them all vs our cell-cell distances/clusters/annotations using gene count tablesAt first, I thought
sourmash compare
was broken because it said it was only loading 3444 signatures out of the 50k:But there's 51,446 files here!!
But then
sourmash index
was more explicit in showing that it was chunking the data:So it seems that
sourmash compute
was not broken after all, but just taking its time through all the samples.Here are my questions:
on chunk 1/20
be output to the stdout?Thank you!
EDIT: This was run on an AWS EC2 m4.large
The text was updated successfully, but these errors were encountered: