A suite of pyspark, pandas and general pipeline utils for Reproducible Data Science and Analysis (RDSA) projects.
The RDSA team sits within the Economic Statistics Change Directorate, and uses cutting-edge data science and engineering skills to produce the next generation of economic statistics. Current priorities include overhauling legacy systems and developing new systems for key statistics related to the economic impact of Brexit, the COVID-19 Pandemic, and inflation.
rdsa-utils
is a Python codebase built with Python 3.8 and higher, and uses Poetry for dependency management and packaging.
- Python 3.8 or higher
- Poetry
Our documentation is automatically generated using GitHub Actions and MkDocs. For an in-depth understanding of rdsa-utils
, how to contribute to rdsa-utils
, and more, please refer to our MkDocs-generated documentation.