This repository contains general Python utilities commonly used in the RXN universe.
For utilities related to chemistry, see our other repository rxn-chemutils
.
Links:
This package is supported on all operating systems. It has been tested on the following systems:
-
macOS: Big Sur (11.1)
-
Linux: Ubuntu 18.04.4
A Python version of 3.6 or greater is recommended.
The package can be installed from Pypi:
pip install rxn-utils
For local development, the package can be installed with:
pip install -e ".[dev]"
load_list_from_file
: read a files into a list of strings.iterate_lines_from_file
: same asload_list_from_file
, but produces an iterator instead of a list. This can be much more memory-efficient.dump_list_to_file
andappend_to_file
: Write an iterable of strings to a file (one per line).named_temporary_path
andnamed_temporary_directory
: provide a context with a file or directory that will be deleted when the context closes. Useful for unit tests.>>> with named_temporary_path() as temporary_path: ... # do something on the temporary path. ... # The file or directory at that path will be deleted at the ... # end of the context, except if delete=False.
- ... and others.
- The function
iterate_csv_column
and the related executablerxn-extract-csv-column
provide an easy way to extract one single column from a CSV file. - The
StreamingCsvEditor
allows for doing a series of operations onto a CSV file without loading it fully in the memory. This is for instance used inrxn-reaction-preprocessing
. See a few examples in the unit tests.
For reproducible shuffling, or for shuffling two files of identical length so that the same permutation is obtained, one can use the stable_shuffle
function.
The executable rxn-stable-shuffle
is also provided for this purpose.
Both also work with CSV files if the appropriate flag is provided.
For batching an iterable into lists of a specified size, chunker
comes in handy.
It also does so in a memory-efficient way.
>>> from rxn.utilities.containers import chunker
>>> for chunk in chunker(range(1, 10), chunk_size=4):
... print(chunk)
[1, 2, 3, 4]
[5, 6, 7, 8]
[9]
remove_duplicates
(or iterate_unique_values
, its memory-efficient variant) removes duplicates from a container, possibly based on a callable instead of the values:
>>> from rxn.utilities.containers import remove_duplicates
>>> remove_duplicates([3, 6, 9, 2, 3, 1, 9])
[3, 6, 9, 2, 1]
>>> remove_duplicates(["ab", "cd", "efg", "hijk", "", "lmn"], key=lambda x: len(x))
['ab', 'efg', 'hijk', '']
regex.py
provides a few functions that make it easier to build regex strings (considering whether segments should be optional, capturing, etc.).
- A custom, more general enum class,
RxnEnum
. remove_prefix
,remove_postfix
.- Initialization of loggers, in a
logging
-compatible way:logging.py
. sandboxed_random_context
andtemporary_random_seed
, to create a context with a specific random state that will not have side effects. Especially useful for testing purposes (unit tests).- ... and others.