cryohub
is a library for reading and writing Cryo-ET data based on the cryotypes
specification.
pip install cryohub
cryohub
provides granular I/O functions such as read_star
and read_mrc
, which will all return objects following the cryotypes
specification.
from cryohub.reading import read_star
poseset = read_star('/path/to/file.star')
A higher level function called read
adds some magic to the IO procedure, guessing file formats and returning a list of cryotypes
.
from cryohub import read
data = read('/path/to/file.star', '/path/to/directotry/', lazy=False, name_regex=r'tomo_\d+')
See the help for each function for more info.
Similarly to the read_*
functions, cryohub
provides a series of write_*
functions, and a magic higher level write
funtion.
from cryohub import write
write([poseset1, poseset2], 'particles.tbl')
cryohub
can be used as a conversion tool between all available formats:
cryohub convert input_file.star output_file.tbl
If instead you just need to quickly inspect your data but want something more powerful than just reading text files or headers, this command will land you in an ipython shell with the loaded data collected in a list called data
:
cryohub view path/to/files/* /other/path/to/file.star
print(data[0])
Currently cryohub
is capable of reading images in the following formats:
.mrc
(and the.mrcs
,.st
,.map
,.rec
variants) =.tif(f)
- Dynamo
.em
- EMAN2
.hdf
and particle data in the following formats:
- Relion
.star
- Dynamo
.tbl
- Cryolo
.cbox
and.box
- EMAN2
.json
1
Writer functions currently exist for:
.mrc
- EMAN2
.hdf
- Dynamo
.em
- Relion
.star
- Dynamo
.tbl
When possible (and unless disabled), cryohub loads images lazily using dask
. The resulting objects can be treated as normal numpy array, except one needs to call array.compute()
to apply any pending operations and return the result.
Contributions are more than welcome! If there is a file format that you wish were supported in reading or writing, simply open an issue about it pointing to the specification. Alternatively, feel free to open a PR with your proposed implementation; you can look at the existing functions for inspiration.
Footnotes
-
EMAN2 uses the center of the tomogram as the origin for particle coordinates. This means that when opening a tomogram, you'll have to recenter the particles based on its dimensions. To do so automatically, you can use the
center_on_tomo
argument to provide thehdf
file with the tomogram you want to use. ↩