libmlops - Home grown Machine Learning Operations
The recommended directory structure for MLOps (Machine Learning Operations) projects can vary depending on the specific tools, frameworks, and practices used. However, here's a common and flexible directory structure that you can use as a starting point:
project_root/ │ ├── bin/ │ ├── config/ │ ├── init.py │ ├── config.py │ └── parameters.yaml │ ├── data/ │ ├── raw/ │ ├── processed/ │ └── external/ │ ├── docs/ │ ├── notebooks/ │ ├── scripts/ │ ├── libmlops/ │ ├── data/ │ │ ├── init.py │ │ ├── data_preprocessing.py │ │ └── data_loading.py │ │ │ ├── features/ │ │ ├── init.py │ │ ├── feature_engineering.py │ │ └── feature_selection.py │ │ │ ├── models/ │ │ ├── init.py │ │ ├── model_training.py │ │ └── model_evaluation.py │ │ │ └── utils/ │ ├── init.py │ ├── logger.py │ └── other_utils.py │ ├── tests/ │ ├── README.md │ └── pyproject.toml
Let's briefly explain the purpose of each directory:
bin/: Executable scripts or binaries.
config/: Configuration files and settings for your project.
data/: This directory contains raw, processed, and external data. external/: This directory often contains datasets that are obtained from external sources. The data in this directory is typically considered read-only and is not modified by the project itself. It may include raw data files, datasets acquired from external APIs, or other sources that are used for analysis or training. raw/: This directory usually holds the raw, unprocessed data obtained from various sources, both internal and external. Raw data might be in its original form and may include data files, CSVs, Excel sheets, images, or any other data format. The contents of this directory may be modified during the data preprocessing phase, but the original data is generally preserved. processed/: This directory typically contains data that has undergone some level of processing or transformation. Processed data may include cleaned datasets, feature-engineered data, or any other data that has been manipulated for analysis or model training. The contents of this directory are often the output of data preprocessing scripts or workflows.
docs/: Documentation for the project.
notebooks/: Jupyter notebooks for exploratory data analysis (EDA), experimentation, and documentation.
scripts/: Any scripts that are used in the project but aren't part of the main codebase.
libmlops/: Source code for your machine learning pipeline. data/: Code related to data loading and preprocessing. features/: Code for feature engineering and selection. models/: Code for defining, training, and evaluating machine learning models. utils/: Utility functions and helper code.
tests/: Unit tests and integration tests for your code.
README.md: Project documentation, including setup instructions, usage, and any other relevant information.
pyproject.toml: List of Python dependencies for your project.
This structure provides a clear separation of concerns and makes it easier to maintain and scale your MLOps project. Keep in mind that this is just a starting point, and you may need to adjust it based on the specific requirements and tools used in your project.