RXN utilities package

This repository contains general Python utilities commonly used in the RXN universe. For utilities related to chemistry, see our other repository rxn-chemutils.

Links:

System Requirements

This package is supported on all operating systems. It has been tested on the following systems:

macOS: Big Sur (11.1)
Linux: Ubuntu 18.04.4

A Python version of 3.6 or greater is recommended.

Installation guide

The package can be installed from Pypi:

pip install rxn-utils

For local development, the package can be installed with:

pip install -e ".[dev]"

Package highlights

File-related utilities

load_list_from_file: read a files into a list of strings.
iterate_lines_from_file: same as load_list_from_file, but produces an iterator instead of a list. This can be much more memory-efficient.
dump_list_to_file and append_to_file: Write an iterable of strings to a file (one per line).

named_temporary_path and named_temporary_directory: provide a context with a file or directory that will be deleted when the context closes. Useful for unit tests.

>>> with named_temporary_path() as temporary_path:
...     # do something on the temporary path.
...     # The file or directory at that path will be deleted at the
...     # end of the context, except if delete=False.

... and others.

CSV-related functionality

The function iterate_csv_column and the related executable rxn-extract-csv-column provide an easy way to extract one single column from a CSV file.
The StreamingCsvEditor allows for doing a series of operations onto a CSV file without loading it fully in the memory. This is for instance used in rxn-reaction-preprocessing. See a few examples in the unit tests.

Stable shuffling

For reproducible shuffling, or for shuffling two files of identical length so that the same permutation is obtained, one can use the stable_shuffle function. The executable rxn-stable-shuffle is also provided for this purpose.

Both also work with CSV files if the appropriate flag is provided.

`chunker` and `remove_duplicates`

For batching an iterable into lists of a specified size, chunker comes in handy. It also does so in a memory-efficient way.

>>> from rxn.utilities.containers import chunker
>>> for chunk in chunker(range(1, 10), chunk_size=4):
...     print(chunk)
[1, 2, 3, 4]
[5, 6, 7, 8]
[9]

remove_duplicates (or iterate_unique_values, its memory-efficient variant) removes duplicates from a container, possibly based on a callable instead of the values:

>>> from rxn.utilities.containers import remove_duplicates
>>> remove_duplicates([3, 6, 9, 2, 3, 1, 9])
[3, 6, 9, 2, 1]
>>> remove_duplicates(["ab", "cd", "efg", "hijk", "", "lmn"], key=lambda x: len(x))
['ab', 'efg', 'hijk', '']

Regex utilities

regex.py provides a few functions that make it easier to build regex strings (considering whether segments should be optional, capturing, etc.).

Others

A custom, more general enum class, RxnEnum.
remove_prefix, remove_postfix.
Initialization of loggers, in a logging-compatible way: logging.py.
sandboxed_random_context and temporary_random_seed, to create a context with a specific random state that will not have side effects. Especially useful for testing purposes (unit tests).
... and others.

Name		Name	Last commit message	Last commit date
Latest commit History 283 Commits
.github/workflows		.github/workflows
docs		docs
src/rxn/utilities		src/rxn/utilities
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RXN utilities package

System Requirements

Installation guide

Package highlights

File-related utilities

CSV-related functionality

Stable shuffling

`chunker` and `remove_duplicates`

Regex utilities

Others

About

Releases

Packages

Contributors 4

Languages

License

rxn4chemistry/rxn-utilities

Folders and files

Latest commit

History

Repository files navigation

RXN utilities package

System Requirements

Installation guide

Package highlights

File-related utilities

CSV-related functionality

Stable shuffling

chunker and remove_duplicates

Regex utilities

Others

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

`chunker` and `remove_duplicates`

Packages