Skip to content

rxn4chemistry/rxn-utilities

Repository files navigation

RXN utilities package

Actions tests

This repository contains general Python utilities commonly used in the RXN universe. For utilities related to chemistry, see our other repository rxn-chemutils.

Links:

System Requirements

This package is supported on all operating systems. It has been tested on the following systems:

  • macOS: Big Sur (11.1)

  • Linux: Ubuntu 18.04.4

A Python version of 3.6 or greater is recommended.

Installation guide

The package can be installed from Pypi:

pip install rxn-utils

For local development, the package can be installed with:

pip install -e ".[dev]"

Package highlights

File-related utilities

  • load_list_from_file: read a files into a list of strings.
  • iterate_lines_from_file: same as load_list_from_file, but produces an iterator instead of a list. This can be much more memory-efficient.
  • dump_list_to_file and append_to_file: Write an iterable of strings to a file (one per line).
  • named_temporary_path and named_temporary_directory: provide a context with a file or directory that will be deleted when the context closes. Useful for unit tests.
    >>> with named_temporary_path() as temporary_path:
    ...     # do something on the temporary path.
    ...     # The file or directory at that path will be deleted at the
    ...     # end of the context, except if delete=False.
  • ... and others.

CSV-related functionality

  • The function iterate_csv_column and the related executable rxn-extract-csv-column provide an easy way to extract one single column from a CSV file.
  • The StreamingCsvEditor allows for doing a series of operations onto a CSV file without loading it fully in the memory. This is for instance used in rxn-reaction-preprocessing. See a few examples in the unit tests.

Stable shuffling

For reproducible shuffling, or for shuffling two files of identical length so that the same permutation is obtained, one can use the stable_shuffle function. The executable rxn-stable-shuffle is also provided for this purpose.

Both also work with CSV files if the appropriate flag is provided.

chunker and remove_duplicates

For batching an iterable into lists of a specified size, chunker comes in handy. It also does so in a memory-efficient way.

>>> from rxn.utilities.containers import chunker
>>> for chunk in chunker(range(1, 10), chunk_size=4):
...     print(chunk)
[1, 2, 3, 4]
[5, 6, 7, 8]
[9]

remove_duplicates (or iterate_unique_values, its memory-efficient variant) removes duplicates from a container, possibly based on a callable instead of the values:

>>> from rxn.utilities.containers import remove_duplicates
>>> remove_duplicates([3, 6, 9, 2, 3, 1, 9])
[3, 6, 9, 2, 1]
>>> remove_duplicates(["ab", "cd", "efg", "hijk", "", "lmn"], key=lambda x: len(x))
['ab', 'efg', 'hijk', '']

Regex utilities

regex.py provides a few functions that make it easier to build regex strings (considering whether segments should be optional, capturing, etc.).

Others

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages