unified-io is a Python utility that attempts to unify several I/O operations (i.e., read/write data in different formats) under a similar interface while making them more concise and user-friendly.
The library provides a unified interface for reading/writing files, which is based on the following principles:
- Read/write interfaces consist of concise functions with similar signatures.
- Read/write interfaces allows passing keyword arguments to the underlying I/O functions to preserve flexibility.
- Read operations can be performed lazily using generators.
- Before reading/writing, the user can specify a callback function that will be applied to each element of the data stream.
- read functions have load aliases (e.g.,
read_csv
has aload_csv
alias) and write functions have save aliases (e.g.,write_csv
has asave_csv
alias. - Use very efficient stuff for each format (e.g.,
orjson
forjson
files). Suggestions are welcome!
python>=3.7
pip install unified-io
The API is designed to be as simple as possible. For example, the following code snippet reads a CSV file, applies a callback function to each element of the data stream, and writes the result to a JSONl file:
from unified_io import read_csv, write_jsonl
def callback(x):
return {"id": x["id"], "title": x["title"].lower()}
# Using a generator we avoid loading the entire file into memory
data = read_csv('input.csv', callback=callback, generator=True)
write_jsonl('output.jsonl', data)
Browse the documentation for more details and examples.
Would you like to see other features implemented? Please, open a feature request.
Would you like to contribute? Please, drop me an e-mail.
unified-io is an open-sourced software licensed under the MIT license.