Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider adding save_pathlist_of_signatures method? #1365

Closed
bluegenes opened this issue Mar 4, 2021 · 6 comments
Closed

consider adding save_pathlist_of_signatures method? #1365

bluegenes opened this issue Mar 4, 2021 · 6 comments

Comments

@bluegenes
Copy link
Contributor

I'm constantly using --from-file file lists these days, so a python api function for saving a file list of sigs (save_list_of_signature_files or save_file_list_of_signatures) might be handy.

At the CLI level, this would enable optionally emitting a list of signature files from sourmash sketch, for use in downstream applications. I could also see it being useful as output of search or prefetch, etc

related to #1350, #1352

@ctb
Copy link
Contributor

ctb commented Mar 4, 2021

right, I think this is another situation where manifests or collections of signatures would be great!

unfortunately, right now the challenge is that we have no way to refer to signatures in indices / collections. so we could easily do this for signatures sitting in individual files, but it would be hard to do signatures in .sbt.zip files or LCA.json files.

Probably need something like file://path/to/sbt.zip#md5sum as a general designature that sourmash can both consume and produce.

@ctb ctb changed the title consider adding save_file_list_of_signatures method? consider adding save_pathlist_of_signatures method? Apr 30, 2021
@ctb
Copy link
Contributor

ctb commented May 8, 2021

we're getting closer - #1493 added generic "save collections of signatures" functionality, and we could add this functionality to that code, I think. Here we'd still need a way to cherry-pick signatures from within collections, which doesn't yet exist, tho.

also ref #1312

@bluegenes
Copy link
Contributor Author

bluegenes commented Jun 24, 2021

With picklists and now manifests on the horizon, we should be able to update collections: read the manifest, and only sketch signatures if they do not already exist.

I think this would involve building the manifest row for a signature that would be generated, checking the existing manifest, and then sketching + adding to the collection if needed. I know this doesn't save a lot of time for a couple signatures, but could save a lot for large collections. Would love to hear thoughts about the downsides of enabling this, though!

My thinking = now that we can extract specific signatures using picklists, folks can keep all their query signatures together in a single (or multiple) zipfile collections, if they want! If this is the case, you can imagine that sometimes folks (me) want to add an additional sample or ksize without needing to regenerate the entire collection.

@ctb
Copy link
Contributor

ctb commented Jun 24, 2021

With picklists and now manifests on the horizon, we should be able to update collections: read the manifest, and only sketch signatures if they do not already exist.

I think this would involve building the manifest row for a signature that would be generated, checking the existing manifest, and then sketching + adding to the collection if needed. I know this doesn't save a lot of time for a couple signatures, but could save a lot for large collections. Would love to hear thoughts about the downsides of enabling this, though!

My thinking = now that we can extract specific signatures using picklists, folks can keep all their query signatures together in a single (or multiple) zipfile collections, if they want! If this is the case, you can imagine that sometimes folks (me) want to add an additional sample or ksize without needing to regenerate the entire collection.

this is an interesting idea, and actually a pretty good use case for IPFS and Redis storage backends, too - people wouldn't need to track filenames if their personal collection of signatures was just floating around in a database namespace somewhere.

However, I'm curious about the connection to this issue in particular - I don't immediately see a strong connection, and the functionality seems mostly distinct. Do we want to make a new issue on this so the idea doesn't get buried here? (Or is there a strong connection that I'm missing?)

Minor note - only a subset of the signature metadata would apply - identifier, moltype, ksize, and filename, I think.

@bluegenes
Copy link
Contributor Author

Hah, it's the logical 'next step' of what I want for saving signatures, but you're right - not so connected here. Will make new one!

@ctb
Copy link
Contributor

ctb commented Mar 26, 2022

hey, look, the core functionality for the original request is provided by #1891 in combination with sourmash sig manifest! Between picklists and manifests we have so much nice functionality here that I think mostly what we need is a somewhat better way of outputting summary or merged manifests that include references to the input Index objects.

I wrote that up over in #1902.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants