LLaVA-O1 is an open-source framework designed for training, inference, and evaluation of large reasoning models (MLLMs) using PyTorch and HuggingFace. It aims to streamline the development of robust multimodal language models by integrating efficient workflows with popular libraries.
- Flexible Training Pipelines: Support for various training modes including single-GPU, multi-GPU, and distributed training.
- Inference with HuggingFace Integration: Seamlessly conduct inference using HuggingFace’s model hub.
- Extensive Evaluation: Built-in metrics and evaluation tools for assessing model performance.
- Multimodal Compatibility: Designed to handle multimodal tasks, enabling diverse applications from text-only tasks to multimodal reasoning.
To install and set up LLaVA-O1, you need to have Python 3.8+ and PyTorch installed. Follow these steps to set up the environment:
# Clone the repository
git clone https://github.com/White65534/LLaVA-O1.git
cd LLaVA-O1
# Create a virtual environment
python -m venv llava_env
source llava_env/bin/activate # On Windows, use `llava_env\Scripts\activate`
# Install dependencies
pip install -r requirements.txt
If you plan to use GPU support, ensure you have the appropriate CUDA version installed.
After installation, you can quickly start by running a simple inference or training example. Here’s how:
- Model Configuration: Define your model configuration by editing the
config.yaml
file. - Dataset Preparation: Organize your data under the
data/
directory, following the example dataset structures provided inexamples/
.
To run a simple inference with LLaVA-O1, use the following command:
python scripts/inference.py --config config.yaml --input data/sample_input.txt
LLaVA-O1 supports both single-GPU and distributed training using PyTorch. To train a model, modify the training configurations in config.yaml
and then run:
python scripts/train.py --config config.yaml
For distributed training, use:
torchrun --nproc_per_node=NUM_GPUS scripts/train.py --config config.yaml
Replace NUM_GPUS
with the number of GPUs available for training.
After training, you can perform inference on new data. Specify the model path in config.yaml
and run:
python scripts/inference.py --config config.yaml --input data/inference_data.txt
To evaluate the model on a test set, use the evaluation script:
python scripts/evaluate.py --config config.yaml --input data/test_data.txt
- Accuracy
- Precision/Recall/F1 Scores
- Task-Specific Metrics (e.g., BLEU, ROUGE for text, etc.)
Contributions are welcome! To contribute:
- Fork the repository.
- Create a new branch with your feature/bugfix.
- Submit a pull request with a clear description of your changes.
This project is licensed under the MIT License. See the LICENSE
file for more information.