-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write DelayedArray to file on the R side. #35
Conversation
Latest commit solves the to-do above, but needs a lot of testing. Briefly, we set up extensible HDF5 datasets and then iterate across row-wise chunks of the sparse DA. In each chunk, we extract the non-zero values, convert them to compressed sparse row format, and append them to the existing HDF5 dataset via The code here makes a number of assumptions:
|
Added a couple of tests and fixed how the loop for rewriting matrices handled paths. |
Hit a roadblock in the form of grimbough/rhdf5#79; the AnnData reader doesn't like the fixed-width byte strings that rhdf5 emits. For the time being, I suggest we just add a clause to the initial check for DAs where we do not skip a DA if it |
@lazappi were you going to work on this? |
Sorry I've lost track of it a bit. Is it just the |
Yes, I think line 60 in my PR should only skip the assay if it's a DA and |
Ok, I've added that check. The code for writing sparse DelayedArrays can't be reached by anything now which made the test coverage drop but that's fine and better to have it there for when the rhdf5 issue is fixed. Anything else to add? |
Think that's it, just slap on some |
Closes #32.
Some points:
readH5AD
.Known to-dos:
is_sparse()
beingTRUE
. A little bit of work is required to achieve a one-pass writer that does not rely on knowing the total number of non-zero entries.