-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement reading/writing HDF5 object references in attribute #96
implement reading/writing HDF5 object references in attribute #96
Conversation
Codecov Report
@@ Coverage Diff @@
## master #96 +/- ##
==========================================
- Coverage 74.84% 74.80% -0.05%
==========================================
Files 34 34
Lines 1805 1814 +9
==========================================
+ Hits 1351 1357 +6
- Misses 454 457 +3
Continue to review full report at Codecov.
|
h5writeAttribute(object_to_reference, object, attribute_name) will write a reference to object_to_reference into attribute_name of object. Similarly, if attr is an attribute containing an object reference, H5Aread(attr) will return an H5IdComponent of the referenced object.
Thanks @ilia-kats for creating this pull request, and apologies for taking so long to get round to doing something with it. I'm not against incorporating something like this in principle, but I wonder if adding this functionality to I've never used HDF5 object references before, so I'm curious what you're doing with them. Do you have any example code or schematics for the file type you're developing? |
Thanks for your reply. I'm working on a pure-R implementation of the AnnData format for single-cell omics data. AnnData is using object references to handle categorical (factor) columns in data frames. The HDF5 object for the column stores the integer codes along with a reference to another HDF5 object storing the labels (code). AnnData has been around for a while and there are tons of these files around, so I'm not really flexible regarding the format. I briefly looked into a low-level wrapper around HDF5 references when I was writing this PR, and wrapping the entire API would require quite some time, which is why I chose to implement this directly in |
I've tried to make the complete H5R API from HDF5 1.10 available in the object-references branch. Thanks a lot for the starting point, was helpful to build on your code. This now supports the dataset region references too if you happen to need those at any point. Having used the functions I can see why some wrapper functions do do the dereferencing automatically would be nice, and I'll probably add those fairly soon, but I don't have time right now. However this API should remain pretty stable if you want to work with that. I'll merge it into bioc-devel once I've written a few tests and the manual pages. Hopefully the examples below are useful, but it looks like you know what you're doing. Let me know if anything is missing or doesn't behave as expected. ## create an example file with a group and a dataset
library(rhdf5)
file_name <- tempfile()
h5createFile(file_name)
h5createGroup(file = file_name, group = "/foo")
#> [1] TRUE
h5write(1:100, file=file_name, name="/foo/baa")
###################################################
## Writing references as an attribute #############
###################################################
## open file and create referece to /foo/baa dataset
fid <- H5Fopen(file_name)
ref_to_dataset <- H5Rcreate(fid, name = "/foo/baa")
## create an attribute to contain our object ref
sid <- H5Screate_simple( length(ref_to_dataset) )
tid <- H5Tcopy(dtype_id = "H5T_STD_REF_OBJ")
obj_ref_attr <- H5Acreate(fid, name = "object_refs", dtype_id = tid, h5space = sid)
## write our references to the attribute & close
H5Awrite(h5attribute = obj_ref_attr, buf = ref_to_dataset)
#> Object reference
## tidy up
H5Aclose(obj_ref_attr)
H5Sclose(sid)
H5Fclose(fid)
###################################################
## Reading reference & dereferencing dataset ######
###################################################
## open file and read attribute
fid <- H5Fopen(file_name)
aid <- H5Aopen(h5obj = fid, name = 'object_refs')
references <- H5Aread(h5attribute = aid)
## this is an H5Ref object
references
#> HDF5 REFERENCE
#> Type: H5R_OBJECT
#> Length: 1
## apply the ref to the file handle and recieve a dataset identifier
dset_from_ref <- H5Rdereference(ref = references, h5loc = fid)
H5Dread(dset_from_ref)
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#> [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
#> [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
#> [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
#> [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
#> [91] 91 92 93 94 95 96 97 98 99 100
## tidy up
H5Aclose(aid)
H5Dclose(dset_from_ref)
H5Fclose(fid) |
h5writeAttribute(object_to_reference, object, attribute_name)
will write a reference toobject_to_reference
intoattribute_name
of object. Similarly, ifattr
is an attribute containing an object reference,H5Aread(attr)
will return anH5IdComponent
of the referenced object.